[jira] [Resolved] (SPARK-31060) Handle column names containing `dots` in data source `Filter`
[ https://issues.apache.org/jira/browse/SPARK-31060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31060. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 27728 [https://github.com/apache/spark/pull/27728] > Handle column names containing `dots` in data source `Filter` > - > > Key: SPARK-31060 > URL: https://issues.apache.org/jira/browse/SPARK-31060 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: DB Tsai >Assignee: DB Tsai >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31060) Handle column names containing `dots` in data source `Filter`
[ https://issues.apache.org/jira/browse/SPARK-31060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-31060: --- Assignee: DB Tsai > Handle column names containing `dots` in data source `Filter` > - > > Key: SPARK-31060 > URL: https://issues.apache.org/jira/browse/SPARK-31060 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: DB Tsai >Assignee: DB Tsai >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31026) Parquet predicate pushdown on columns with dots
[ https://issues.apache.org/jira/browse/SPARK-31026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31026. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 27728 [https://github.com/apache/spark/pull/27728] > Parquet predicate pushdown on columns with dots > --- > > Key: SPARK-31026 > URL: https://issues.apache.org/jira/browse/SPARK-31026 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: DB Tsai >Assignee: DB Tsai >Priority: Major > Fix For: 3.0.0 > > > Parquet predicate pushdown on columns with dots is disabled in -SPARK-20364- > due to Parquet's APIs don't support it. A new set of APIs is purposed in > PARQUET-1809 to generalize the support of nested cols which can address this > issue. This implementation will be merged into Spark repo first until we get > a new release from Parquet community. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17636) Parquet predicate pushdown for nested fields
[ https://issues.apache.org/jira/browse/SPARK-17636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-17636. - Resolution: Fixed Issue resolved by pull request 27728 [https://github.com/apache/spark/pull/27728] > Parquet predicate pushdown for nested fields > > > Key: SPARK-17636 > URL: https://issues.apache.org/jira/browse/SPARK-17636 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 1.6.2, 1.6.3, 2.0.2 >Reporter: Mitesh >Assignee: DB Tsai >Priority: Minor > Fix For: 3.0.0 > > > There's a *PushedFilters* for a simple numeric field, but not for a numeric > field inside a struct. Not sure if this is a Spark limitation because of > Parquet, or only a Spark limitation. > {noformat} > scala> hc.read.parquet("s3a://some/parquet/file").select("day_timestamp", > "sale_id") > res5: org.apache.spark.sql.DataFrame = [day_timestamp: > struct, sale_id: bigint] > scala> res5.filter("sale_id > 4").queryExecution.executedPlan > res9: org.apache.spark.sql.execution.SparkPlan = > Filter[23814] [args=(sale_id#86324L > > 4)][outPart=UnknownPartitioning(0)][outOrder=List()] > +- Scan ParquetRelation[day_timestamp#86302,sale_id#86324L] InputPaths: > s3a://some/parquet/file, PushedFilters: [GreaterThan(sale_id,4)] > scala> res5.filter("day_timestamp.timestamp > 4").queryExecution.executedPlan > res10: org.apache.spark.sql.execution.SparkPlan = > Filter[23815] [args=(day_timestamp#86302.timestamp > > 4)][outPart=UnknownPartitioning(0)][outOrder=List()] > +- Scan ParquetRelation[day_timestamp#86302,sale_id#86324L] InputPaths: > s3a://some/parquet/file > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31026) Parquet predicate pushdown on columns with dots
[ https://issues.apache.org/jira/browse/SPARK-31026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-31026: --- Assignee: DB Tsai > Parquet predicate pushdown on columns with dots > --- > > Key: SPARK-31026 > URL: https://issues.apache.org/jira/browse/SPARK-31026 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: DB Tsai >Assignee: DB Tsai >Priority: Major > > Parquet predicate pushdown on columns with dots is disabled in -SPARK-20364- > due to Parquet's APIs don't support it. A new set of APIs is purposed in > PARQUET-1809 to generalize the support of nested cols which can address this > issue. This implementation will be merged into Spark repo first until we get > a new release from Parquet community. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31088) Add back HiveContext and createExternalTable
[ https://issues.apache.org/jira/browse/SPARK-31088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-31088. - Fix Version/s: 3.0.0 Resolution: Fixed > Add back HiveContext and createExternalTable > > > Key: SPARK-31088 > URL: https://issues.apache.org/jira/browse/SPARK-31088 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Xiao Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25556) Predicate Pushdown for Nested fields
[ https://issues.apache.org/jira/browse/SPARK-25556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-25556. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 27728 [https://github.com/apache/spark/pull/27728] > Predicate Pushdown for Nested fields > > > Key: SPARK-25556 > URL: https://issues.apache.org/jira/browse/SPARK-25556 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: DB Tsai >Assignee: DB Tsai >Priority: Major > Fix For: 3.0.0 > > > This is an umbrella JIRA to support predicate pushdown for nested fields. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31086) Add Back the Deprecated SQLContext methods
[ https://issues.apache.org/jira/browse/SPARK-31086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-31086. - Fix Version/s: 3.0.0 Resolution: Fixed > Add Back the Deprecated SQLContext methods > -- > > Key: SPARK-31086 > URL: https://issues.apache.org/jira/browse/SPARK-31086 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Xiao Li >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31282) Supplement version for configuration appear in security doc
jiaan.geng created SPARK-31282: -- Summary: Supplement version for configuration appear in security doc Key: SPARK-31282 URL: https://issues.apache.org/jira/browse/SPARK-31282 Project: Spark Issue Type: Sub-task Components: Documentation Affects Versions: 3.1.0 Reporter: jiaan.geng docs/security.md -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31281) Hit OOM Error - GC Limit
[ https://issues.apache.org/jira/browse/SPARK-31281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HongJin updated SPARK-31281: Description: MemoryStore is 2.6GB conf = new SparkConf().setAppName("test") //.set("spark.sql.codegen.wholeStage", "false") .set("spark.driver.host", "localhost") .set("spark.driver.memory", "4g") .set("spark.executor.cores","1") .set("spark.num.executors","1") .set("spark.executor.memory", "4g") .set("spark.executor.memoryOverhead", "400m") .set("spark.dynamicAllocation.enabled", "true") .set("spark.dynamicAllocation.minExecutors","1") .set("spark.dynamicAllocation.maxExecutors","2") .set("spark.ui.enabled","true") //enable spark UI .set("spark.sql.shuffle.partitions",defaultPartitions) .setMaster("local[2]") sparkSession = SparkSession.builder.config(conf).getOrCreate() val df = SparkFactory.sparkSession.sqlContext .read .option("header", "true") .option("delimiter", delimiter) .csv(textFileLocation) joinedDf = upperCaseLeft.as("l") .join(upperCaseRight.as("r"), caseTransformedKeys, "full_outer") .select(compositeKeysCol ::: nonKeyCols.map(col => mapHelper(col,toleranceValue,caseSensitive)): _*) data = joinedDf.take(maxRecords) was: conf = new SparkConf().setAppName("test") //.set("spark.sql.codegen.wholeStage", "false") .set("spark.driver.host", "localhost") .set("spark.driver.memory", "4g") .set("spark.executor.cores","1") .set("spark.num.executors","1") .set("spark.executor.memory", "4g") .set("spark.executor.memoryOverhead", "400m") .set("spark.dynamicAllocation.enabled", "true") .set("spark.dynamicAllocation.minExecutors","1") .set("spark.dynamicAllocation.maxExecutors","2") .set("spark.ui.enabled","true") //enable spark UI .set("spark.sql.shuffle.partitions",defaultPartitions) .setMaster(numCores) sparkSession = SparkSession.builder.config(conf).getOrCreate() val df = SparkFactory.sparkSession.sqlContext .read .option("header", "true") .option("delimiter", delimiter) .csv(textFileLocation) joinedDf = upperCaseLeft.as("l") .join(upperCaseRight.as("r"), caseTransformedKeys, "full_outer") .select(compositeKeysCol ::: nonKeyCols.map(col => mapHelper(col,toleranceValue,caseSensitive)): _*) data = joinedDf.take(maxRecords) > Hit OOM Error - GC Limit > > > Key: SPARK-31281 > URL: https://issues.apache.org/jira/browse/SPARK-31281 > Project: Spark > Issue Type: Question > Components: Java API >Affects Versions: 2.4.4 >Reporter: HongJin >Priority: Critical > > MemoryStore is 2.6GB > conf = new SparkConf().setAppName("test") > //.set("spark.sql.codegen.wholeStage", "false") > .set("spark.driver.host", "localhost") > .set("spark.driver.memory", "4g") > .set("spark.executor.cores","1") > .set("spark.num.executors","1") > .set("spark.executor.memory", "4g") > .set("spark.executor.memoryOverhead", "400m") > .set("spark.dynamicAllocation.enabled", "true") > .set("spark.dynamicAllocation.minExecutors","1") > .set("spark.dynamicAllocation.maxExecutors","2") > .set("spark.ui.enabled","true") //enable spark UI > .set("spark.sql.shuffle.partitions",defaultPartitions) > .setMaster("local[2]") > sparkSession = SparkSession.builder.config(conf).getOrCreate() > > val df = SparkFactory.sparkSession.sqlContext > .read > .option("header", "true") > .option("delimiter", delimiter) > .csv(textFileLocation) > > joinedDf = upperCaseLeft.as("l") > .join(upperCaseRight.as("r"), caseTransformedKeys, "full_outer") > .select(compositeKeysCol ::: nonKeyCols.map(col => > mapHelper(col,toleranceValue,caseSensitive)): _*) > > data = joinedDf.take(maxRecords) > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31281) Hit OOM Error - GC Limit
HongJin created SPARK-31281: --- Summary: Hit OOM Error - GC Limit Key: SPARK-31281 URL: https://issues.apache.org/jira/browse/SPARK-31281 Project: Spark Issue Type: Question Components: Java API Affects Versions: 2.4.4 Reporter: HongJin conf = new SparkConf().setAppName("test") //.set("spark.sql.codegen.wholeStage", "false") .set("spark.driver.host", "localhost") .set("spark.driver.memory", "4g") .set("spark.executor.cores","1") .set("spark.num.executors","1") .set("spark.executor.memory", "4g") .set("spark.executor.memoryOverhead", "400m") .set("spark.dynamicAllocation.enabled", "true") .set("spark.dynamicAllocation.minExecutors","1") .set("spark.dynamicAllocation.maxExecutors","2") .set("spark.ui.enabled","true") //enable spark UI .set("spark.sql.shuffle.partitions",defaultPartitions) .setMaster(numCores) sparkSession = SparkSession.builder.config(conf).getOrCreate() val df = SparkFactory.sparkSession.sqlContext .read .option("header", "true") .option("delimiter", delimiter) .csv(textFileLocation) joinedDf = upperCaseLeft.as("l") .join(upperCaseRight.as("r"), caseTransformedKeys, "full_outer") .select(compositeKeysCol ::: nonKeyCols.map(col => mapHelper(col,toleranceValue,caseSensitive)): _*) data = joinedDf.take(maxRecords) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31243) add ANOVATest and FValueTest to PySpark
[ https://issues.apache.org/jira/browse/SPARK-31243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-31243. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 28012 [https://github.com/apache/spark/pull/28012] > add ANOVATest and FValueTest to PySpark > --- > > Key: SPARK-31243 > URL: https://issues.apache.org/jira/browse/SPARK-31243 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Minor > Fix For: 3.1.0 > > > add ANOVATest and FValueTest to Python side, -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31243) add ANOVATest and FValueTest to PySpark
[ https://issues.apache.org/jira/browse/SPARK-31243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-31243: Assignee: Huaxin Gao > add ANOVATest and FValueTest to PySpark > --- > > Key: SPARK-31243 > URL: https://issues.apache.org/jira/browse/SPARK-31243 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Minor > > add ANOVATest and FValueTest to Python side, -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31275) Improve the metrics format in ExecutionPage for StageId
[ https://issues.apache.org/jira/browse/SPARK-31275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31275. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28039 [https://github.com/apache/spark/pull/28039] > Improve the metrics format in ExecutionPage for StageId > --- > > Key: SPARK-31275 > URL: https://issues.apache.org/jira/browse/SPARK-31275 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.0.0, 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > Fix For: 3.0.0 > > > In ExecutionPage, the metrics for stageId and attemptId are displayed like > "stageId (attempt)" but the format "stageId.attempt" is more standard in > Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-30443) "Managed memory leak detected" even with no calls to take() or limit()
[ https://issues.apache.org/jira/browse/SPARK-30443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068290#comment-17068290 ] Xiaoju Wu edited comment on SPARK-30443 at 3/27/20, 5:50 AM: - Also see this kind of warning logs. SPARK-21492 may relate to this warning. Does your code base contain it? And I'm afraid there could be other consumers not release memory by themselves but let the task release all memory related to taskId at the end of task. was (Author: xiaojuwu): Also see this kind of warning logs. SPARK-21492 may relate to this warning. Does your code base contain it? > "Managed memory leak detected" even with no calls to take() or limit() > -- > > Key: SPARK-30443 > URL: https://issues.apache.org/jira/browse/SPARK-30443 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2, 2.4.4, 3.0.0 >Reporter: Luke Richter >Priority: Major > Attachments: a.csv.zip, b.csv.zip, c.csv.zip > > > Our Spark code is causing a "Managed memory leak detected" warning to appear, > even though we are not calling take() or limit(). > According to SPARK-14168 https://issues.apache.org/jira/browse/SPARK-14168 > managed memory leaks should only be caused by not reading an iterator to > completion, i.e. take() or limit() > Our exact warning text is: "2020-01-06 14:54:59 WARN Executor:66 - Managed > memory leak detected; size = 2097152 bytes, TID = 118" > The size of the managed memory leak is always 2MB. > I have created a minimal test program that reproduces the warning: > {code:java} > import pyspark.sql > import pyspark.sql.functions as fx > def main(): > builder = pyspark.sql.SparkSession.builder > builder = builder.appName("spark-jira") > spark = builder.getOrCreate() > reader = spark.read > reader = reader.format("csv") > reader = reader.option("inferSchema", "true") > reader = reader.option("header", "true") > table_c = reader.load("c.csv") > table_a = reader.load("a.csv") > table_b = reader.load("b.csv") > primary_filter = fx.col("some_code").isNull() > new_primary_data = table_a.filter(primary_filter) > new_ids = new_primary_data.select("some_id") > new_data = table_b.join(new_ids, "some_id") > new_data = new_data.select("some_id") > result = table_c.join(new_data, "some_id", "left") > result.repartition(1).write.json("results.json", mode="overwrite") > spark.stop() > if __name__ == "__main__": > main() > {code} > Our code isn't anything out of the ordinary, just some filters, selects and > joins. > The input data is made up of 3 CSV files. The input data files are quite > large, roughly 2.6GB in total uncompressed. I attempted to reduce the number > of rows in the CSV input files but this caused the warning to no longer > appear. After compressing the files I was able to attach them below. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30443) "Managed memory leak detected" even with no calls to take() or limit()
[ https://issues.apache.org/jira/browse/SPARK-30443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068290#comment-17068290 ] Xiaoju Wu commented on SPARK-30443: --- Also see this kind of warning logs. SPARK-21492 may relate to this warning. Does your code base contain it? > "Managed memory leak detected" even with no calls to take() or limit() > -- > > Key: SPARK-30443 > URL: https://issues.apache.org/jira/browse/SPARK-30443 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2, 2.4.4, 3.0.0 >Reporter: Luke Richter >Priority: Major > Attachments: a.csv.zip, b.csv.zip, c.csv.zip > > > Our Spark code is causing a "Managed memory leak detected" warning to appear, > even though we are not calling take() or limit(). > According to SPARK-14168 https://issues.apache.org/jira/browse/SPARK-14168 > managed memory leaks should only be caused by not reading an iterator to > completion, i.e. take() or limit() > Our exact warning text is: "2020-01-06 14:54:59 WARN Executor:66 - Managed > memory leak detected; size = 2097152 bytes, TID = 118" > The size of the managed memory leak is always 2MB. > I have created a minimal test program that reproduces the warning: > {code:java} > import pyspark.sql > import pyspark.sql.functions as fx > def main(): > builder = pyspark.sql.SparkSession.builder > builder = builder.appName("spark-jira") > spark = builder.getOrCreate() > reader = spark.read > reader = reader.format("csv") > reader = reader.option("inferSchema", "true") > reader = reader.option("header", "true") > table_c = reader.load("c.csv") > table_a = reader.load("a.csv") > table_b = reader.load("b.csv") > primary_filter = fx.col("some_code").isNull() > new_primary_data = table_a.filter(primary_filter) > new_ids = new_primary_data.select("some_id") > new_data = table_b.join(new_ids, "some_id") > new_data = new_data.select("some_id") > result = table_c.join(new_data, "some_id", "left") > result.repartition(1).write.json("results.json", mode="overwrite") > spark.stop() > if __name__ == "__main__": > main() > {code} > Our code isn't anything out of the ordinary, just some filters, selects and > joins. > The input data is made up of 3 CSV files. The input data files are quite > large, roughly 2.6GB in total uncompressed. I attempted to reduce the number > of rows in the CSV input files but this caused the warning to no longer > appear. After compressing the files I was able to attach them below. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31280) Perform propagating empty relation after RewritePredicateSubquery
Kent Yao created SPARK-31280: Summary: Perform propagating empty relation after RewritePredicateSubquery Key: SPARK-31280 URL: https://issues.apache.org/jira/browse/SPARK-31280 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Kent Yao {code:java} scala> spark.sql(" select * from values(1), (2) t(key) where key in (select 1 as key where 1=0)").queryExecution res15: org.apache.spark.sql.execution.QueryExecution = == Parsed Logical Plan == 'Project [*] +- 'Filter 'key IN (list#39 []) : +- Project [1 AS key#38] : +- Filter (1 = 0) :+- OneRowRelation +- 'SubqueryAlias t +- 'UnresolvedInlineTable [key], [List(1), List(2)] == Analyzed Logical Plan == key: int Project [key#40] +- Filter key#40 IN (list#39 []) : +- Project [1 AS key#38] : +- Filter (1 = 0) :+- OneRowRelation +- SubqueryAlias t +- LocalRelation [key#40] == Optimized Logical Plan == Join LeftSemi, (key#40 = key#38) :- LocalRelation [key#40] +- LocalRelation , [key#38] == Physical Plan == *(1) BroadcastHashJoin [key#40], [key#38], LeftSemi, BuildRight :- *(1) LocalTableScan [key#40] +- Br... {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31279) Add version information to the configuration of Hive
jiaan.geng created SPARK-31279: -- Summary: Add version information to the configuration of Hive Key: SPARK-31279 URL: https://issues.apache.org/jira/browse/SPARK-31279 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: jiaan.geng sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31204) HiveResult compatibility for DatasourceV2 command
[ https://issues.apache.org/jira/browse/SPARK-31204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-31204: --- Assignee: Terry Kim > HiveResult compatibility for DatasourceV2 command > - > > Key: SPARK-31204 > URL: https://issues.apache.org/jira/browse/SPARK-31204 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Juliusz Sompolski >Assignee: Terry Kim >Priority: Major > > HiveResult performs some compatibility matches and conversions for commands > to be compatible with Hive output, e.g.: > {code} > case ExecutedCommandExec(_: DescribeCommandBase) => > // If it is a describe command for a Hive table, we want to have the > output format > // be similar with Hive. > ... > // SHOW TABLES in Hive only output table names, while ours output > database, table name, isTemp. > case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended > => > {code} > It is needed for DatasourceV2 commands as well (eg. ShowTablesExec...). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31204) HiveResult compatibility for DatasourceV2 command
[ https://issues.apache.org/jira/browse/SPARK-31204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31204. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28004 [https://github.com/apache/spark/pull/28004] > HiveResult compatibility for DatasourceV2 command > - > > Key: SPARK-31204 > URL: https://issues.apache.org/jira/browse/SPARK-31204 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Juliusz Sompolski >Assignee: Terry Kim >Priority: Major > Fix For: 3.0.0 > > > HiveResult performs some compatibility matches and conversions for commands > to be compatible with Hive output, e.g.: > {code} > case ExecutedCommandExec(_: DescribeCommandBase) => > // If it is a describe command for a Hive table, we want to have the > output format > // be similar with Hive. > ... > // SHOW TABLES in Hive only output table names, while ours output > database, table name, isTemp. > case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended > => > {code} > It is needed for DatasourceV2 commands as well (eg. ShowTablesExec...). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31170) Spark Cli does not respect hive-site.xml and spark.sql.warehouse.dir
[ https://issues.apache.org/jira/browse/SPARK-31170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31170. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 27969 [https://github.com/apache/spark/pull/27969] > Spark Cli does not respect hive-site.xml and spark.sql.warehouse.dir > > > Key: SPARK-31170 > URL: https://issues.apache.org/jira/browse/SPARK-31170 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.0.0 > > > In Spark CLI, we create a hive CliSessionState and it does not load the > hive-site.xml. So the configurations in hive-site.xml will not take effects > like other spark-hive integration apps. > Also, the warehouse directory is not correctly picked. If the `default` > database does not exist, the CliSessionState will create one during the first > time it talks to the metastore. The `Location` of the default DB will be > neither the value of spark.sql.warehousr.dir nor the user-specified value of > hive.metastore.warehourse.dir, but the default value of > hive.metastore.warehourse.dir which will always be `/user/hive/warehouse`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31186) toPandas fails on simple query (collect() works)
[ https://issues.apache.org/jira/browse/SPARK-31186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-31186. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28025 [https://github.com/apache/spark/pull/28025] > toPandas fails on simple query (collect() works) > > > Key: SPARK-31186 > URL: https://issues.apache.org/jira/browse/SPARK-31186 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.4 >Reporter: Michael Chirico >Assignee: L. C. Hsieh >Priority: Minor > Fix For: 3.0.0 > > > My pandas is 0.25.1. > I ran the following simple code (cross joins are enabled): > {code:python} > spark.sql(''' > select t1.*, t2.* from ( > select explode(sequence(1, 3)) v > ) t1 left join ( > select explode(sequence(1, 3)) v > ) t2 > ''').toPandas() > {code} > and got a ValueError from pandas: > > ValueError: The truth value of a Series is ambiguous. Use a.empty, > > a.bool(), a.item(), a.any() or a.all(). > Collect works fine: > {code:python} > spark.sql(''' > select * from ( > select explode(sequence(1, 3)) v > ) t1 left join ( > select explode(sequence(1, 3)) v > ) t2 > ''').collect() > # [Row(v=1, v=1), > # Row(v=1, v=2), > # Row(v=1, v=3), > # Row(v=2, v=1), > # Row(v=2, v=2), > # Row(v=2, v=3), > # Row(v=3, v=1), > # Row(v=3, v=2), > # Row(v=3, v=3)] > {code} > I imagine it's related to the duplicate column names, but this doesn't fail: > {code:python} > spark.sql("select 1 v, 1 v").toPandas() > # v v > # 0 1 1 > {code} > Also no issue for multiple rows: > spark.sql("select 1 v, 1 v union all select 1 v, 2 v").toPandas() > It also works when not using a cross join but a janky > programatically-generated union all query: > {code:python} > cond = [] > for ii in range(3): > for jj in range(3): > cond.append(f'select {ii+1} v, {jj+1} v') > spark.sql(' union all '.join(cond)).toPandas() > {code} > As near as I can tell, the output is identical to the explode output, making > this issue all the more peculiar, as I thought toPandas() is applied to the > output of collect(), so if collect() gives the same output, how can > toPandas() fail in one case and not the other? Further, the lazy DataFrame is > the same: DataFrame[v: int, v: int] in both cases. I must be missing > something. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31186) toPandas fails on simple query (collect() works)
[ https://issues.apache.org/jira/browse/SPARK-31186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-31186: Assignee: L. C. Hsieh > toPandas fails on simple query (collect() works) > > > Key: SPARK-31186 > URL: https://issues.apache.org/jira/browse/SPARK-31186 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.4 >Reporter: Michael Chirico >Assignee: L. C. Hsieh >Priority: Minor > > My pandas is 0.25.1. > I ran the following simple code (cross joins are enabled): > {code:python} > spark.sql(''' > select t1.*, t2.* from ( > select explode(sequence(1, 3)) v > ) t1 left join ( > select explode(sequence(1, 3)) v > ) t2 > ''').toPandas() > {code} > and got a ValueError from pandas: > > ValueError: The truth value of a Series is ambiguous. Use a.empty, > > a.bool(), a.item(), a.any() or a.all(). > Collect works fine: > {code:python} > spark.sql(''' > select * from ( > select explode(sequence(1, 3)) v > ) t1 left join ( > select explode(sequence(1, 3)) v > ) t2 > ''').collect() > # [Row(v=1, v=1), > # Row(v=1, v=2), > # Row(v=1, v=3), > # Row(v=2, v=1), > # Row(v=2, v=2), > # Row(v=2, v=3), > # Row(v=3, v=1), > # Row(v=3, v=2), > # Row(v=3, v=3)] > {code} > I imagine it's related to the duplicate column names, but this doesn't fail: > {code:python} > spark.sql("select 1 v, 1 v").toPandas() > # v v > # 0 1 1 > {code} > Also no issue for multiple rows: > spark.sql("select 1 v, 1 v union all select 1 v, 2 v").toPandas() > It also works when not using a cross join but a janky > programatically-generated union all query: > {code:python} > cond = [] > for ii in range(3): > for jj in range(3): > cond.append(f'select {ii+1} v, {jj+1} v') > spark.sql(' union all '.join(cond)).toPandas() > {code} > As near as I can tell, the output is identical to the explode output, making > this issue all the more peculiar, as I thought toPandas() is applied to the > output of collect(), so if collect() gives the same output, how can > toPandas() fail in one case and not the other? Further, the lazy DataFrame is > the same: DataFrame[v: int, v: int] in both cases. I must be missing > something. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25641) Change the spark.shuffle.server.chunkFetchHandlerThreadsPercent default to 100
[ https://issues.apache.org/jira/browse/SPARK-25641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068232#comment-17068232 ] Dongjoon Hyun commented on SPARK-25641: --- This commit is technically reverted via SPARK-30623 . > Change the spark.shuffle.server.chunkFetchHandlerThreadsPercent default to 100 > -- > > Key: SPARK-25641 > URL: https://issues.apache.org/jira/browse/SPARK-25641 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Sanket Reddy >Assignee: Sanket Reddy >Priority: Minor > Fix For: 3.0.0 > > > We want to change the default percentage to 100 for > spark.shuffle.server.chunkFetchHandlerThreadsPercent. The reason being > currently this is set to 0. Which means currently if server.ioThreads > 0, > the default number of threads would be 2 * #cores instead of > server.io.Threads. We want the default to server.io.Threads in case this is > not set at all. Also here a default of 0 would also mean 2 * #cores -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31191) Spark SQL and hive metastore are incompatible
[ https://issues.apache.org/jira/browse/SPARK-31191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068200#comment-17068200 ] Yuming Wang commented on SPARK-31191: - [~leishuiyu] Add spark.sql.hive.metastore.jars=maven? > Spark SQL and hive metastore are incompatible > - > > Key: SPARK-31191 > URL: https://issues.apache.org/jira/browse/SPARK-31191 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 > Environment: the spark version 2.3.0 > the hive version 2.3.3 >Reporter: leishuiyu >Priority: Major > Attachments: image-2020-03-23-21-37-17-663.png > > > # > h3. When I execute bin/spark-sql, an exception occurs > > {code:java} > Caused by: java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientCaused by: > java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) > ... 12 moreCaused by: java.lang.reflect.InvocationTargetException at > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521) > ... 18 moreCaused by: MetaException(message:Hive Schema version 1.2.0 does > not match metastore's schema version 2.3.0 Metastore is not upgraded or > corrupt) at > org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:6679) > at > org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:6645) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114) > at com.sun.proxy.$Proxy6.verifySchema(Unknown Source) at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:572) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:66) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:199) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74) > ... 23 more > {code} > h3. 2.Find the reason > query the source code, in spark jars directory have > hive-metastore-1.2.1.spark2.jar > the 1.2.1 version match 1.2.0 ,so generate the exception > > > {code:java} > //代码占位符 > private static final Map EQUIVALENT_VERSIONS = > ImmutableMap.of("0.13.1", "0.13.0", > "1.0.0", "0.14.0", > "1.0.1", "1.0.0", > "1.1.1", "1.1.0", > "1.2.1", "1.2.0" > ); > {code} > > h3. 3.Is there any solution to this problem > can edit hive-site.xml hive.metastore.schema.verification set true,but > new problems may arise > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31276) Contrived working example that works with multiple URI file storages for Spark cluster mode
[ https://issues.apache.org/jira/browse/SPARK-31276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Huang updated SPARK-31276: -- Description: This Spark SQL Guide --> Data sources --> Generic Load/Save Functions [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html] described a very simple "local file system load of an example file". I am looking for an example that demonstrates a workflow that exercises different file systems. For example, # Driver loads an input file from local file system # Add a simple column using lit() and stores that DataFrame in cluster mode to HDFS # Write that a small limited subset of that DataFrame back to Driver's local file system. (This is to avoid the anti-pattern of writing large file and out of the scope for this example. The small limited DataFrame would be some basic statistics, not the actual complete dataset.) The examples I found on the internet only uses simple paths without the explicit URI prefixes. Without the explicit URI prefixes, the "filepath" inherits how Spark (mode) was called, local stand alone vs YARN client mode. So a "filepath" will be read/write locally (file system) vs cluster mode HDFS, without these explicit URIs. There are situations were a Spark program needs to deal with both local file system and YARN client mode (big data) in the same Spark application, like producing a summary table stored on the local file system of the driver at the end. If there are any existing alternatives Spark documentation that provides examples that traverse through the different URIs in Spark YARN client mode or a better or smarter Spark pattern or API that is more suited for this, I am happy to accept that as well. Thanks! was: This Spark SQL Guide --> Data sources --> Generic Load/Save Functions [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html] described a very simple "local file system load of an example file". I am looking for an example that demonstrates a workflow that exercises different file systems. For example, # Driver loads an input file from local file system # Add a simple column using lit() and stores that DataFrame in cluster mode to HDFS # Write that same final DataFrame back to Driver's local file system The examples I found on the internet only uses simple paths without the explicit URI prefixes. Without the explicit URI prefixes, the "filepath" inherits how Spark (mode) was called, local stand alone vs cluster mode. So a "filepath" will be read/write locally (file system) vs cluster mode HDFS, without these explicit URIs. There are situations were a Spark program needs to deal with both local file system and cluster mode (big data) in the same Spark application, like producing a summary table stored on the local file system of the driver at the end. If there are any existing alternatives Spark documentation that provides examples of different URIs, I am happy to accept that as well. Thanks! > Contrived working example that works with multiple URI file storages for > Spark cluster mode > --- > > Key: SPARK-31276 > URL: https://issues.apache.org/jira/browse/SPARK-31276 > Project: Spark > Issue Type: Wish > Components: Examples >Affects Versions: 2.4.5 >Reporter: Jim Huang >Priority: Major > > This Spark SQL Guide --> Data sources --> Generic Load/Save Functions > [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html] > described a very simple "local file system load of an example file". > > I am looking for an example that demonstrates a workflow that exercises > different file systems. For example, > # Driver loads an input file from local file system > # Add a simple column using lit() and stores that DataFrame in cluster mode > to HDFS > # Write that a small limited subset of that DataFrame back to Driver's local > file system. (This is to avoid the anti-pattern of writing large file and > out of the scope for this example. The small limited DataFrame would be some > basic statistics, not the actual complete dataset.) > > The examples I found on the internet only uses simple paths without the > explicit URI prefixes. > Without the explicit URI prefixes, the "filepath" inherits how Spark (mode) > was called, local stand alone vs YARN client mode. So a "filepath" will be > read/write locally (file system) vs cluster mode HDFS, without these explicit > URIs. > There are situations were a Spark program needs to deal with both local file > system and YARN client mode (big data) in the same Spark application, like > producing a summary table stored on the local file system of the driver at > the end. > If there are any
[jira] [Updated] (SPARK-31262) Test case import another test case contains bracketed comments, can't display bracketed comments in golden files well.
[ https://issues.apache.org/jira/browse/SPARK-31262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-31262: - Component/s: Tests > Test case import another test case contains bracketed comments, can't display > bracketed comments in golden files well. > -- > > Key: SPARK-31262 > URL: https://issues.apache.org/jira/browse/SPARK-31262 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.1.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.0.0 > > > The content of > {code:java} > nested-comments.sql > {code} show below: > {code:java} > -- This test case just used to test imported bracketed comments. > -- the first case of bracketed comment > --QUERY-DELIMITER-START > /* This is the first example of bracketed comment. > SELECT 'ommented out content' AS first; > */ > SELECT 'selected content' AS first; > --QUERY-DELIMITER-END > {code} > The test case > {code:java} > comments.sql > {code} imports > {code:java} > nested-comments.sql > {code} > below: > {code:java} > --IMPORT nested-comments.sql > {code} > The output will be: > {code:java} > -- !query > /* This is the first example of bracketed comment. > SELECT 'ommented out content' AS first > -- !query schema > struct<> > -- !query output > org.apache.spark.sql.catalyst.parser.ParseException > mismatched input '/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', > 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', > 'DROP', > 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', > 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', > 'REVOKE', ' > ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', > 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) > == SQL == > /* This is the first example of bracketed comment. > ^^^ > SELECT 'ommented out content' AS first > -- !query > */ > SELECT 'selected content' AS first > -- !query schema > struct<> > -- !query output > org.apache.spark.sql.catalyst.parser.ParseException > extraneous input '*/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', > 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', > 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', > 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', > 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', > 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, > pos 0) > == SQL == > */ > ^^^ > SELECT 'selected content' AS first > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31262) Test case import another test case contains bracketed comments, can't display bracketed comments in golden files well.
[ https://issues.apache.org/jira/browse/SPARK-31262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-31262. -- Fix Version/s: 3.0.0 Assignee: jiaan.geng Resolution: Fixed Resolved by [https://github.com/apache/spark/pull/28018#] > Test case import another test case contains bracketed comments, can't display > bracketed comments in golden files well. > -- > > Key: SPARK-31262 > URL: https://issues.apache.org/jira/browse/SPARK-31262 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.0.0 > > > The content of > {code:java} > nested-comments.sql > {code} show below: > {code:java} > -- This test case just used to test imported bracketed comments. > -- the first case of bracketed comment > --QUERY-DELIMITER-START > /* This is the first example of bracketed comment. > SELECT 'ommented out content' AS first; > */ > SELECT 'selected content' AS first; > --QUERY-DELIMITER-END > {code} > The test case > {code:java} > comments.sql > {code} imports > {code:java} > nested-comments.sql > {code} > below: > {code:java} > --IMPORT nested-comments.sql > {code} > The output will be: > {code:java} > -- !query > /* This is the first example of bracketed comment. > SELECT 'ommented out content' AS first > -- !query schema > struct<> > -- !query output > org.apache.spark.sql.catalyst.parser.ParseException > mismatched input '/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', > 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', > 'DROP', > 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', > 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', > 'REVOKE', ' > ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', > 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) > == SQL == > /* This is the first example of bracketed comment. > ^^^ > SELECT 'ommented out content' AS first > -- !query > */ > SELECT 'selected content' AS first > -- !query schema > struct<> > -- !query output > org.apache.spark.sql.catalyst.parser.ParseException > extraneous input '*/' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', > 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', > 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', > 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', > 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', > 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, > pos 0) > == SQL == > */ > ^^^ > SELECT 'selected content' AS first > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31238) Incompatible ORC dates with Spark 2.4
[ https://issues.apache.org/jira/browse/SPARK-31238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-31238: - Assignee: Maxim Gekk > Incompatible ORC dates with Spark 2.4 > - > > Key: SPARK-31238 > URL: https://issues.apache.org/jira/browse/SPARK-31238 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Bruce Robbins >Assignee: Maxim Gekk >Priority: Blocker > > Using Spark 2.4.5, write pre-1582 date to ORC file and then read it: > {noformat} > $ export TZ=UTC > $ bin/spark-shell --conf spark.sql.session.timeZone=UTC > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 2.4.5-SNAPSHOT > /_/ > > Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java > 1.8.0_161) > Type in expressions to have them evaluated. > Type :help for more information. > scala> sql("select cast('1200-01-01' as date) > dt").write.mode("overwrite").orc("/tmp/datefile") > scala> spark.read.orc("/tmp/datefile").show > +--+ > |dt| > +--+ > |1200-01-01| > +--+ > scala> :quit > {noformat} > Using Spark 3.0 (branch-3.0 at commit a934142f24), read the same file: > {noformat} > $ export TZ=UTC > $ bin/spark-shell --conf spark.sql.session.timeZone=UTC > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT > /_/ > > Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java > 1.8.0_161) > Type in expressions to have them evaluated. > Type :help for more information. > scala> spark.read.orc("/tmp/datefile").show > +--+ > |dt| > +--+ > |1200-01-08| > +--+ > scala> > {noformat} > Dates are off. > Timestamps, on the other hand, appear to work as expected. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31238) Incompatible ORC dates with Spark 2.4
[ https://issues.apache.org/jira/browse/SPARK-31238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-31238. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28016 [https://github.com/apache/spark/pull/28016] > Incompatible ORC dates with Spark 2.4 > - > > Key: SPARK-31238 > URL: https://issues.apache.org/jira/browse/SPARK-31238 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Bruce Robbins >Assignee: Maxim Gekk >Priority: Blocker > Fix For: 3.0.0 > > > Using Spark 2.4.5, write pre-1582 date to ORC file and then read it: > {noformat} > $ export TZ=UTC > $ bin/spark-shell --conf spark.sql.session.timeZone=UTC > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 2.4.5-SNAPSHOT > /_/ > > Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java > 1.8.0_161) > Type in expressions to have them evaluated. > Type :help for more information. > scala> sql("select cast('1200-01-01' as date) > dt").write.mode("overwrite").orc("/tmp/datefile") > scala> spark.read.orc("/tmp/datefile").show > +--+ > |dt| > +--+ > |1200-01-01| > +--+ > scala> :quit > {noformat} > Using Spark 3.0 (branch-3.0 at commit a934142f24), read the same file: > {noformat} > $ export TZ=UTC > $ bin/spark-shell --conf spark.sql.session.timeZone=UTC > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT > /_/ > > Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java > 1.8.0_161) > Type in expressions to have them evaluated. > Type :help for more information. > scala> spark.read.orc("/tmp/datefile").show > +--+ > |dt| > +--+ > |1200-01-08| > +--+ > scala> > {noformat} > Dates are off. > Timestamps, on the other hand, appear to work as expected. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30320) Insert overwrite to DataSource table with dynamic partition error when running multiple task attempts
[ https://issues.apache.org/jira/browse/SPARK-30320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068005#comment-17068005 ] koert kuipers commented on SPARK-30320: --- i believe we are seeing this issue. it shows up in particular when pre-emption is turned on and we are using dynamic partition overwrite. pre-emption kills tasks, they get restarted, and then they fail again because the output directory already exists (so task throws FileAlreadyExistsException). as a result entire job fails. so i dont think this is just a speculative execution issue. this is a general issue with dynamic partition overwrite not being able to recover from task failure. > Insert overwrite to DataSource table with dynamic partition error when > running multiple task attempts > - > > Key: SPARK-30320 > URL: https://issues.apache.org/jira/browse/SPARK-30320 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4, 3.0.0 >Reporter: Du Ripeng >Priority: Major > > Inserting overwrite to a DataSource table with dynamic partition might fail > when running multiple task attempts. Suppose there are a task attempt and a > speculative task attempt, the speculative attempt would raise > FileAlreadyExistException -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-30320) Insert overwrite to DataSource table with dynamic partition error when running multiple task attempts
[ https://issues.apache.org/jira/browse/SPARK-30320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068005#comment-17068005 ] koert kuipers edited comment on SPARK-30320 at 3/26/20, 8:11 PM: - i believe we are seeing this issue. it shows up in particular when pre-emption is turned on and we are using dynamic partition overwrite. pre-emption kills tasks, they get restarted, and then they fail again because the output directory already exists (so task throws FileAlreadyExistsException). as a result entire job fails. so i dont think this is just a speculative execution issue. this is a general issue with dynamic partition overwrite not being able to recover from task failure. see also SPARK-29302 which is same issue i believe. was (Author: koert): i believe we are seeing this issue. it shows up in particular when pre-emption is turned on and we are using dynamic partition overwrite. pre-emption kills tasks, they get restarted, and then they fail again because the output directory already exists (so task throws FileAlreadyExistsException). as a result entire job fails. so i dont think this is just a speculative execution issue. this is a general issue with dynamic partition overwrite not being able to recover from task failure. > Insert overwrite to DataSource table with dynamic partition error when > running multiple task attempts > - > > Key: SPARK-30320 > URL: https://issues.apache.org/jira/browse/SPARK-30320 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4, 3.0.0 >Reporter: Du Ripeng >Priority: Major > > Inserting overwrite to a DataSource table with dynamic partition might fail > when running multiple task attempts. Suppose there are a task attempt and a > speculative task attempt, the speculative attempt would raise > FileAlreadyExistException -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31278) numOutputRows shows value from last micro batch when there is no new data
Burak Yavuz created SPARK-31278: --- Summary: numOutputRows shows value from last micro batch when there is no new data Key: SPARK-31278 URL: https://issues.apache.org/jira/browse/SPARK-31278 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 3.0.0 Reporter: Burak Yavuz In Structured Streaming, we provide progress updates every 10 seconds when a stream doesn't have any new data upstream. When providing this progress though, we zero out the input information but not the output information. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-29302) dynamic partition overwrite with speculation enabled
[ https://issues.apache.org/jira/browse/SPARK-29302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067972#comment-17067972 ] koert kuipers edited comment on SPARK-29302 at 3/26/20, 7:23 PM: - i believe we are seeing this issue. it shows up in particular when pre-emption is turned on and we are using dynamic partition overwrite. pre-emption kills tasks, they get restarted, and then they fail again because the output directory already exists (so task throws FileAlreadyExistsException). as a result entire job fails. so i dont think this is just a speculative execution issue. this is a general issue with dynamic partition overwrite not being able to recover from task failure. was (Author: koert): i believe we are seeing this issue. it shows up in particular when pre-emption is turned on and we are using dynamic partition overwrite. pre-emption kills tasks, they get restarted, and then they fail again because the output directory already exists (so task throws FileAlreadyExistsException). as a result entire job fails. > dynamic partition overwrite with speculation enabled > > > Key: SPARK-29302 > URL: https://issues.apache.org/jira/browse/SPARK-29302 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 >Reporter: feiwang >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > > Now, for a dynamic partition overwrite operation, the filename of a task > output is determinable. > So, if speculation is enabled, would a task conflict with its relative > speculation task? > Would the two tasks concurrent write a same file? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31277) Migrate `DateTimeTestUtils` from `TimeZone` to `ZoneId`
Maxim Gekk created SPARK-31277: -- Summary: Migrate `DateTimeTestUtils` from `TimeZone` to `ZoneId` Key: SPARK-31277 URL: https://issues.apache.org/jira/browse/SPARK-31277 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Currently, Spark SQL's date-time expressions and functions are ported on Java 8 time API but tests still use old time APIs. In particular, DateTimeTestUtils exposes functions that accept only TimeZone instances. This is inconvenient, and CPU consuming because need to convert TimeZone instances to ZoneId instances via strings (zone ids). The tickets aims to replace TimeZone parameters of DateTimeTestUtils functions by ZoneId type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29302) dynamic partition overwrite with speculation enabled
[ https://issues.apache.org/jira/browse/SPARK-29302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067972#comment-17067972 ] koert kuipers commented on SPARK-29302: --- i believe we are seeing this issue. it shows up in particular when pre-emption is turned on and we are using dynamic partition overwrite. pre-emption kills tasks, they get restarted, and then they fail again because the output directory alreay exsists (so task throws FileAlreadyExistsException). as a result entire job fails. > dynamic partition overwrite with speculation enabled > > > Key: SPARK-29302 > URL: https://issues.apache.org/jira/browse/SPARK-29302 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 >Reporter: feiwang >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > > Now, for a dynamic partition overwrite operation, the filename of a task > output is determinable. > So, if speculation is enabled, would a task conflict with its relative > speculation task? > Would the two tasks concurrent write a same file? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-29302) dynamic partition overwrite with speculation enabled
[ https://issues.apache.org/jira/browse/SPARK-29302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067972#comment-17067972 ] koert kuipers edited comment on SPARK-29302 at 3/26/20, 7:16 PM: - i believe we are seeing this issue. it shows up in particular when pre-emption is turned on and we are using dynamic partition overwrite. pre-emption kills tasks, they get restarted, and then they fail again because the output directory already exists (so task throws FileAlreadyExistsException). as a result entire job fails. was (Author: koert): i believe we are seeing this issue. it shows up in particular when pre-emption is turned on and we are using dynamic partition overwrite. pre-emption kills tasks, they get restarted, and then they fail again because the output directory alreay exsists (so task throws FileAlreadyExistsException). as a result entire job fails. > dynamic partition overwrite with speculation enabled > > > Key: SPARK-29302 > URL: https://issues.apache.org/jira/browse/SPARK-29302 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 >Reporter: feiwang >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > > Now, for a dynamic partition overwrite operation, the filename of a task > output is determinable. > So, if speculation is enabled, would a task conflict with its relative > speculation task? > Would the two tasks concurrent write a same file? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31276) Contrived working example that works with multiple URI file storages for Spark cluster mode
[ https://issues.apache.org/jira/browse/SPARK-31276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Huang updated SPARK-31276: -- Description: This Spark SQL Guide --> Data sources --> Generic Load/Save Functions [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html] described a very simple "local file system load of an example file". I am looking for an example that demonstrates a workflow that exercises different file systems. For example, # Driver loads an input file from local file system # Add a simple column using lit() and stores that DataFrame in cluster mode to HDFS # Write that same final DataFrame back to Driver's local file system The examples I found on the internet only uses simple paths without the explicit URI prefixes. Without the explicit URI prefixes, the "filepath" inherits how Spark (mode) was called, local stand alone vs cluster mode. So a "filepath" will be read/write locally (file system) vs cluster mode HDFS, without these explicit URIs. There are situations were a Spark program needs to deal with both local file system and cluster mode (big data) in the same Spark application, like producing a summary table stored on the local file system of the driver at the end. If there are any existing alternatives Spark documentation that provides examples of different URIs, I am happy to accept that as well. Thanks! was: This Spark SQL Guide --> Data sources --> Generic Load/Save Functions [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html] described a very simple "local file system load of an example file". I am looking for an example that demonstrates a workflow that exercises different file systems. For example, # Driver loads an input file from local file system # Add a simple column using lit() and stores that DataFrame in cluster mode to HDFS # Write that same final DataFrame back to Driver's local file system The examples I found on the internet only uses simple paths without the explicit URI prefixes. Without the explicit URI prefixes, the "filepath" inherits how Spark (mode) was called, local stand alone vs cluster mode. So a "filepath" will be read/write locally (file system) vs cluster mode HDFS, without these explicit URIs. There are situations were a Spark program needs to deal with both local file system and cluster mode (big data) in the same Spark application, like producing a summary table stored on the local file system of the driver at the end. Thanks. > Contrived working example that works with multiple URI file storages for > Spark cluster mode > --- > > Key: SPARK-31276 > URL: https://issues.apache.org/jira/browse/SPARK-31276 > Project: Spark > Issue Type: Wish > Components: Examples >Affects Versions: 2.4.5 >Reporter: Jim Huang >Priority: Major > > This Spark SQL Guide --> Data sources --> Generic Load/Save Functions > [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html] > described a very simple "local file system load of an example file". > > I am looking for an example that demonstrates a workflow that exercises > different file systems. For example, > # Driver loads an input file from local file system > # Add a simple column using lit() and stores that DataFrame in cluster mode > to HDFS > # Write that same final DataFrame back to Driver's local file system > > The examples I found on the internet only uses simple paths without the > explicit URI prefixes. > Without the explicit URI prefixes, the "filepath" inherits how Spark (mode) > was called, local stand alone vs cluster mode. So a "filepath" will be > read/write locally (file system) vs cluster mode HDFS, without these explicit > URIs. > There are situations were a Spark program needs to deal with both local file > system and cluster mode (big data) in the same Spark application, like > producing a summary table stored on the local file system of the driver at > the end. > If there are any existing alternatives Spark documentation that provides > examples of different URIs, I am happy to accept that as well. Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31276) Contrived working example that works with multiple URI file storages for Spark cluster mode
Jim Huang created SPARK-31276: - Summary: Contrived working example that works with multiple URI file storages for Spark cluster mode Key: SPARK-31276 URL: https://issues.apache.org/jira/browse/SPARK-31276 Project: Spark Issue Type: Wish Components: Examples Affects Versions: 2.4.5 Reporter: Jim Huang This Spark SQL Guide --> Data sources --> Generic Load/Save Functions [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html] described a very simple "local file system load of an example file". I am looking for an example that demonstrates a workflow that exercises different file systems. For example, # Driver loads an input file from local file system # Add a simple column using lit() and stores that DataFrame in cluster mode to HDFS # Write that same final DataFrame back to Driver's local file system The examples I found on the internet only uses simple paths without the explicit URI prefixes. Without the explicit URI prefixes, the "filepath" inherits how Spark (mode) was called, local stand alone vs cluster mode. So a "filepath" will be read/write locally (file system) vs cluster mode HDFS, without these explicit URIs. There are situations were a Spark program needs to deal with both local file system and cluster mode (big data) in the same Spark application, like producing a summary table stored on the local file system of the driver at the end. Thanks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31275) Improve the metrics format in ExecutionPage for StageId
Kousuke Saruta created SPARK-31275: -- Summary: Improve the metrics format in ExecutionPage for StageId Key: SPARK-31275 URL: https://issues.apache.org/jira/browse/SPARK-31275 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 3.0.0, 3.1.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta In ExecutionPage, the metrics for stageId and attemptId are displayed like "stageId (attempt)" but the format "stageId.attempt" is more standard in Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31274) Support .r files in 2.x version
[ https://issues.apache.org/jira/browse/SPARK-31274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gaurangi Saxena updated SPARK-31274: Priority: Minor (was: Trivial) > Support .r files in 2.x version > --- > > Key: SPARK-31274 > URL: https://issues.apache.org/jira/browse/SPARK-31274 > Project: Spark > Issue Type: Question > Components: Input/Output >Affects Versions: 2.3.4 >Reporter: Gaurangi Saxena >Priority: Minor > Fix For: 3.1.0 > > > Hello, > We are using Spark 2.3.4 currently that does not allow .r files in > Spark-Submit. However, the latest versions of Spark do. It is a bit difficult > for us at the moment to upgrade the Spark version we are using. > Can you point me to the Jira that added support for .r? I am not able to find > it in the issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31274) Support .r files in 2.x version
[ https://issues.apache.org/jira/browse/SPARK-31274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gaurangi Saxena updated SPARK-31274: Issue Type: Question (was: Bug) > Support .r files in 2.x version > --- > > Key: SPARK-31274 > URL: https://issues.apache.org/jira/browse/SPARK-31274 > Project: Spark > Issue Type: Question > Components: Input/Output >Affects Versions: 2.3.4 >Reporter: Gaurangi Saxena >Priority: Trivial > Fix For: 3.1.0 > > > Hello, > We are using Spark 2.3.4 currently that does not allow .r files in > Spark-Submit. However, the latest versions of Spark do. It is a bit difficult > for us at the moment to upgrade the Spark version we are using. > Can you point me to the Jira that added support for .r? I am not able to find > it in the issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31274) Support .r files in 2.x version
Gaurangi Saxena created SPARK-31274: --- Summary: Support .r files in 2.x version Key: SPARK-31274 URL: https://issues.apache.org/jira/browse/SPARK-31274 Project: Spark Issue Type: Bug Components: Input/Output Affects Versions: 2.3.4 Reporter: Gaurangi Saxena Fix For: 3.1.0 Hello, We are using Spark 2.3.4 currently that does not allow .r files in Spark-Submit. However, the latest versions of Spark do. It is a bit difficult for us at the moment to upgrade the Spark version we are using. Can you point me to the Jira that added support for .r? I am not able to find it in the issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-30095) create function syntax has to be enhance in Doc for multiple dependent jars
[ https://issues.apache.org/jira/browse/SPARK-30095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067892#comment-17067892 ] Huaxin Gao edited comment on SPARK-30095 at 3/26/20, 5:45 PM: -- I took a look of the create function doc, it has the following syntax for jar and file: {code:java} resource_locations Specifies the list of resources that contain the implementation of the function along with its dependencies. Syntax: USING { { (JAR | FILE ) resource_uri } , ... } {code} The syntax has ... which means the preceding elements can be repeated. It seems to me that we cover the multiple jars OK. It also seems to me that we cover the file syntax OK. I am thinking of closing this ticket. was (Author: huaxingao): I took a look of the create function doc, it has the following syntax for jar and file: resource_locations Specifies the list of resources that contain the implementation of the function along with its dependencies. Syntax: USING { { (JAR | FILE ) resource_uri } , ... } The syntax has ... which means the preceding elements can be repeated. It seems to me that we cover the multiple jars OK. It also seems to me that we cover the file syntax OK. I am thinking of closing this ticket. > create function syntax has to be enhance in Doc for multiple dependent jars > > > Key: SPARK-30095 > URL: https://issues.apache.org/jira/browse/SPARK-30095 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.0.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Minor > > Create Function Example and Syntax has to be enhance as below > 1. Case 1: How to use multiple dependent jars in the path while creating > function is not clear. -- Syntax to be given > 2. Case 2: What are the different schema supported like file:/// is not > updated in doc - Supported Schema to be provided -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30095) create function syntax has to be enhance in Doc for multiple dependent jars
[ https://issues.apache.org/jira/browse/SPARK-30095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067892#comment-17067892 ] Huaxin Gao commented on SPARK-30095: I took a look of the create function doc, it has the following syntax for jar and file: resource_locations Specifies the list of resources that contain the implementation of the function along with its dependencies. Syntax: USING { { (JAR | FILE ) resource_uri } , ... } The syntax has ... which means the preceding elements can be repeated. It seems to me that we cover the multiple jars OK. It also seems to me that we cover the file syntax OK. I am thinking of closing this ticket. > create function syntax has to be enhance in Doc for multiple dependent jars > > > Key: SPARK-30095 > URL: https://issues.apache.org/jira/browse/SPARK-30095 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.0.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Minor > > Create Function Example and Syntax has to be enhance as below > 1. Case 1: How to use multiple dependent jars in the path while creating > function is not clear. -- Syntax to be given > 2. Case 2: What are the different schema supported like file:/// is not > updated in doc - Supported Schema to be provided -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31273) Support for BEGIN/COMMIT/ROLLBACK TRANSACTION in SparkSQL
Sergio Sainz created SPARK-31273: Summary: Support for BEGIN/COMMIT/ROLLBACK TRANSACTION in SparkSQL Key: SPARK-31273 URL: https://issues.apache.org/jira/browse/SPARK-31273 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.5 Reporter: Sergio Sainz Looking for support for atomic transactions: BEGIN/COMMIT/ROLLBACK. Such as here: BEGIN TRANSACTION: [https://docs.microsoft.com/en-us/sql/t-sql/language-elements/begin-transaction-transact-sql?view=sql-server-ver15] COMMIT TRANSACTION: [https://docs.microsoft.com/en-us/sql/t-sql/language-elements/commit-transaction-transact-sql?view=sql-server-ver15] ROLLBACK TRANSACTION: [https://docs.microsoft.com/en-us/sql/t-sql/language-elements/rollback-transaction-transact-sql?view=sql-server-ver15] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31259) Fix log error of curRequestSize in ShuffleBlockFetcherIterator
[ https://issues.apache.org/jira/browse/SPARK-31259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-31259: - Assignee: wuyi > Fix log error of curRequestSize in ShuffleBlockFetcherIterator > -- > > Key: SPARK-31259 > URL: https://issues.apache.org/jira/browse/SPARK-31259 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0, 3.1.0 >Reporter: wuyi >Assignee: wuyi >Priority: Minor > > The log of curRequestSize is incorrect. Because curRequestSize may be the > total size of several group blocks but we use it for each group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31259) Fix log error of curRequestSize in ShuffleBlockFetcherIterator
[ https://issues.apache.org/jira/browse/SPARK-31259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-31259. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28028 [https://github.com/apache/spark/pull/28028] > Fix log error of curRequestSize in ShuffleBlockFetcherIterator > -- > > Key: SPARK-31259 > URL: https://issues.apache.org/jira/browse/SPARK-31259 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0, 3.1.0 >Reporter: wuyi >Assignee: wuyi >Priority: Minor > Fix For: 3.0.0 > > > The log of curRequestSize is incorrect. Because curRequestSize may be the > total size of several group blocks but we use it for each group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31272) Support DB2 Kerberos login in JDBC connector
Gabor Somogyi created SPARK-31272: - Summary: Support DB2 Kerberos login in JDBC connector Key: SPARK-31272 URL: https://issues.apache.org/jira/browse/SPARK-31272 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Gabor Somogyi -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31272) Support DB2 Kerberos login in JDBC connector
[ https://issues.apache.org/jira/browse/SPARK-31272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067781#comment-17067781 ] Gabor Somogyi commented on SPARK-31272: --- Started to work on this. > Support DB2 Kerberos login in JDBC connector > > > Key: SPARK-31272 > URL: https://issues.apache.org/jira/browse/SPARK-31272 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Gabor Somogyi >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-31191) Spark SQL and hive metastore are incompatible
[ https://issues.apache.org/jira/browse/SPARK-31191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leishuiyu reopened SPARK-31191: --- > Spark SQL and hive metastore are incompatible > - > > Key: SPARK-31191 > URL: https://issues.apache.org/jira/browse/SPARK-31191 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 > Environment: the spark version 2.3.0 > the hive version 2.3.3 >Reporter: leishuiyu >Priority: Major > Attachments: image-2020-03-23-21-37-17-663.png > > > # > h3. When I execute bin/spark-sql, an exception occurs > > {code:java} > Caused by: java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientCaused by: > java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) > ... 12 moreCaused by: java.lang.reflect.InvocationTargetException at > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521) > ... 18 moreCaused by: MetaException(message:Hive Schema version 1.2.0 does > not match metastore's schema version 2.3.0 Metastore is not upgraded or > corrupt) at > org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:6679) > at > org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:6645) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114) > at com.sun.proxy.$Proxy6.verifySchema(Unknown Source) at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:572) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:66) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:199) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74) > ... 23 more > {code} > h3. 2.Find the reason > query the source code, in spark jars directory have > hive-metastore-1.2.1.spark2.jar > the 1.2.1 version match 1.2.0 ,so generate the exception > > > {code:java} > //代码占位符 > private static final Map EQUIVALENT_VERSIONS = > ImmutableMap.of("0.13.1", "0.13.0", > "1.0.0", "0.14.0", > "1.0.1", "1.0.0", > "1.1.1", "1.1.0", > "1.2.1", "1.2.0" > ); > {code} > > h3. 3.Is there any solution to this problem > can edit hive-site.xml hive.metastore.schema.verification set true,but > new problems may arise > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29154) Update Spark scheduler for stage level scheduling
[ https://issues.apache.org/jira/browse/SPARK-29154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-29154. --- Fix Version/s: 3.1.0 Assignee: Thomas Graves Resolution: Fixed > Update Spark scheduler for stage level scheduling > - > > Key: SPARK-29154 > URL: https://issues.apache.org/jira/browse/SPARK-29154 > Project: Spark > Issue Type: Story > Components: Scheduler >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > Fix For: 3.1.0 > > > Make the changes to DAGscheduler, stage, task set manager, task scheduler to > support scheduling based on the resource profiles. Note that the logic to > merge profiles has a separate jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31263) Enable yarn shuffle service close the idle connections
[ https://issues.apache.org/jira/browse/SPARK-31263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] feiwang resolved SPARK-31263. - Resolution: Duplicate > Enable yarn shuffle service close the idle connections > -- > > Key: SPARK-31263 > URL: https://issues.apache.org/jira/browse/SPARK-31263 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 3.1.0 >Reporter: feiwang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31201) add an individual config for skewed partition threshold
[ https://issues.apache.org/jira/browse/SPARK-31201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-31201. -- Target Version/s: 3.0.0 Resolution: Fixed Fixed in https://github.com/apache/spark/pull/27967 > add an individual config for skewed partition threshold > --- > > Key: SPARK-31201 > URL: https://issues.apache.org/jira/browse/SPARK-31201 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31271) fix web ui for driver side SQL metrics
Wenchen Fan created SPARK-31271: --- Summary: fix web ui for driver side SQL metrics Key: SPARK-31271 URL: https://issues.apache.org/jira/browse/SPARK-31271 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31268) TaskEnd event with zero Executor Metrics when task duration less then poll interval
[ https://issues.apache.org/jira/browse/SPARK-31268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067661#comment-17067661 ] angerszhu commented on SPARK-31268: --- raise a pr soon > TaskEnd event with zero Executor Metrics when task duration less then poll > interval > --- > > Key: SPARK-31268 > URL: https://issues.apache.org/jira/browse/SPARK-31268 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: angerszhu >Priority: Major > Attachments: screenshot-1.png > > > TaskEnd event with zero Executor Metrics when task duration less then poll > interval -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31270) Expose executor memory metrics at the task detal, in the Stages tab
[ https://issues.apache.org/jira/browse/SPARK-31270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067657#comment-17067657 ] angerszhu commented on SPARK-31270: --- Raise a pr soon > Expose executor memory metrics at the task detal, in the Stages tab > --- > > Key: SPARK-31270 > URL: https://issues.apache.org/jira/browse/SPARK-31270 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: angerszhu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31270) Expose executor memory metrics at the task detal, in the Stages tab
angerszhu created SPARK-31270: - Summary: Expose executor memory metrics at the task detal, in the Stages tab Key: SPARK-31270 URL: https://issues.apache.org/jira/browse/SPARK-31270 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.0.0 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31269) Supplement version for configuration only appear in configuration doc
jiaan.geng created SPARK-31269: -- Summary: Supplement version for configuration only appear in configuration doc Key: SPARK-31269 URL: https://issues.apache.org/jira/browse/SPARK-31269 Project: Spark Issue Type: Sub-task Components: Documentation Affects Versions: 3.1.0 Reporter: jiaan.geng The configuration doc exists some config not organized by ConfigEntry. We need to supplement version for configuration only appear in configuration doc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31268) TaskEnd event with zero Executor Metrics when task duration less then poll interval
[ https://issues.apache.org/jira/browse/SPARK-31268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-31268: -- Attachment: screenshot-1.png > TaskEnd event with zero Executor Metrics when task duration less then poll > interval > --- > > Key: SPARK-31268 > URL: https://issues.apache.org/jira/browse/SPARK-31268 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: angerszhu >Priority: Major > Attachments: screenshot-1.png > > > TaskEnd event with zero Executor Metrics when task duration less then poll > interval -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31268) TaskEnd event with zero Executor Metrics when task duration less then poll interval
angerszhu created SPARK-31268: - Summary: TaskEnd event with zero Executor Metrics when task duration less then poll interval Key: SPARK-31268 URL: https://issues.apache.org/jira/browse/SPARK-31268 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.0.0 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31268) TaskEnd event with zero Executor Metrics when task duration less then poll interval
[ https://issues.apache.org/jira/browse/SPARK-31268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-31268: -- Description: TaskEnd event with zero Executor Metrics when task duration less then poll interval > TaskEnd event with zero Executor Metrics when task duration less then poll > interval > --- > > Key: SPARK-31268 > URL: https://issues.apache.org/jira/browse/SPARK-31268 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: angerszhu >Priority: Major > > TaskEnd event with zero Executor Metrics when task duration less then poll > interval -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31228) Add version information to the configuration of Kafka
[ https://issues.apache.org/jira/browse/SPARK-31228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-31228. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 27989 [https://github.com/apache/spark/pull/27989] > Add version information to the configuration of Kafka > - > > Key: SPARK-31228 > URL: https://issues.apache.org/jira/browse/SPARK-31228 > Project: Spark > Issue Type: Sub-task > Components: DStreams >Affects Versions: 3.1.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.1.0 > > > external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/package.scala > external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/package.scala -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31228) Add version information to the configuration of Kafka
[ https://issues.apache.org/jira/browse/SPARK-31228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-31228: Assignee: jiaan.geng > Add version information to the configuration of Kafka > - > > Key: SPARK-31228 > URL: https://issues.apache.org/jira/browse/SPARK-31228 > Project: Spark > Issue Type: Sub-task > Components: DStreams >Affects Versions: 3.1.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > > external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/package.scala > external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/package.scala -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31242) Clone SparkSession should respect spark.sql.legacy.sessionInitWithConfigDefaults
[ https://issues.apache.org/jira/browse/SPARK-31242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31242. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28014 [https://github.com/apache/spark/pull/28014] > Clone SparkSession should respect > spark.sql.legacy.sessionInitWithConfigDefaults > > > Key: SPARK-31242 > URL: https://issues.apache.org/jira/browse/SPARK-31242 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > Fix For: 3.0.0 > > > In SQL test, a conf specified by `withSQLConf` can be reverted to "original > value" after cloning SparkSession if the "original value" is already set in > SparkConf level. Because in `WithTestConf`, it doesn't respect > spark.sql.legacy.sessionInitWithConfigDefaults and always merge SQLConf with > SparkConf. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31242) Clone SparkSession should respect spark.sql.legacy.sessionInitWithConfigDefaults
[ https://issues.apache.org/jira/browse/SPARK-31242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-31242: --- Assignee: wuyi > Clone SparkSession should respect > spark.sql.legacy.sessionInitWithConfigDefaults > > > Key: SPARK-31242 > URL: https://issues.apache.org/jira/browse/SPARK-31242 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > > In SQL test, a conf specified by `withSQLConf` can be reverted to "original > value" after cloning SparkSession if the "original value" is already set in > SparkConf level. Because in `WithTestConf`, it doesn't respect > spark.sql.legacy.sessionInitWithConfigDefaults and always merge SQLConf with > SparkConf. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31267) Flaky test: WholeStageCodegenSparkSubmitSuite.Generated code on driver should not embed platform-specific constant
Gabor Somogyi created SPARK-31267: - Summary: Flaky test: WholeStageCodegenSparkSubmitSuite.Generated code on driver should not embed platform-specific constant Key: SPARK-31267 URL: https://issues.apache.org/jira/browse/SPARK-31267 Project: Spark Issue Type: Bug Components: SQL, Tests Affects Versions: 3.0.0, 3.1.0 Reporter: Gabor Somogyi https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120363/testReport/ {code} Error Message org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to failAfter did not complete within 1 minute. Stacktrace sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to failAfter did not complete within 1 minute. at java.lang.Thread.getStackTrace(Thread.java:1559) at org.scalatest.concurrent.TimeLimits.failAfterImpl(TimeLimits.scala:234) at org.scalatest.concurrent.TimeLimits.failAfterImpl$(TimeLimits.scala:233) at org.apache.spark.deploy.SparkSubmitSuite$.failAfterImpl(SparkSubmitSuite.scala:1416) at org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:230) at org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:229) at org.apache.spark.deploy.SparkSubmitSuite$.failAfter(SparkSubmitSuite.scala:1416) at org.apache.spark.deploy.SparkSubmitSuite$.runSparkSubmit(SparkSubmitSuite.scala:1435) at org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite.$anonfun$new$1(WholeStageCodegenSparkSubmitSuite.scala:53) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151) at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286) at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:58) at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221) at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214) at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:58) at org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229) at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393) at scala.collection.immutable.List.foreach(List.scala:392) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381) at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458) at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229) at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228) at org.scalatest.FunSuite.runTests(FunSuite.scala:1560) at org.scalatest.Suite.run(Suite.scala:1124) at org.scalatest.Suite.run$(Suite.scala:1106) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560) at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233) at org.scalatest.SuperEngine.runImpl(Engine.scala:518) at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233) at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:58) at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) at sbt.ForkMain$Run$2.call(ForkMain.java:296) at sbt.ForkMain$Run$2.call(ForkMain.java:286) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.c
[jira] [Created] (SPARK-31266) Flaky test: KafkaDataConsumerSuite.SPARK-25151 Handles multiple tasks in executor fetching same (topic, partition) pair and same offset (edge-case) - data not in use
Gabor Somogyi created SPARK-31266: - Summary: Flaky test: KafkaDataConsumerSuite.SPARK-25151 Handles multiple tasks in executor fetching same (topic, partition) pair and same offset (edge-case) - data not in use Key: SPARK-31266 URL: https://issues.apache.org/jira/browse/SPARK-31266 Project: Spark Issue Type: Bug Components: Structured Streaming, Tests Affects Versions: 3.0.0, 3.1.0 Reporter: Gabor Somogyi https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120363/testReport/ {code} Error Message java.util.concurrent.TimeoutException: Timeout after waiting for 1 ms. Stacktrace sbt.ForkMain$ForkError: java.util.concurrent.TimeoutException: Timeout after waiting for 1 ms. at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:78) at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:30) at org.apache.spark.sql.kafka010.KafkaTestUtils.$anonfun$sendMessages$3(KafkaTestUtils.scala:425) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at scala.collection.Iterator.foreach(Iterator.scala:941) at scala.collection.Iterator.foreach$(Iterator.scala:941) at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at scala.collection.TraversableLike.map(TraversableLike.scala:238) at scala.collection.TraversableLike.map$(TraversableLike.scala:231) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.sql.kafka010.KafkaTestUtils.sendMessages(KafkaTestUtils.scala:424) at org.apache.spark.sql.kafka010.consumer.KafkaDataConsumerSuite.prepareTestTopicHavingTestMessages(KafkaDataConsumerSuite.scala:377) at org.apache.spark.sql.kafka010.consumer.KafkaDataConsumerSuite.$anonfun$new$17(KafkaDataConsumerSuite.scala:320) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151) at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286) at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:58) at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221) at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214) at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:58) at org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229) at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393) at scala.collection.immutable.List.foreach(List.scala:392) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381) at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458) at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229) at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228) at org.scalatest.FunSuite.runTests(FunSuite.scala:1560) at org.scalatest.Suite.run(Suite.scala:1124) at org.scalatest.Suite.run$(Suite.scala:1106) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560) at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233) at org.scalatest.SuperEngine.runImpl(Engine.scala:518) at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233) at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:58) at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) at org.scalatest.BeforeAndAf
[jira] [Updated] (SPARK-31234) ResetCommand should not wipe out all configs
[ https://issues.apache.org/jira/browse/SPARK-31234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-31234: - Issue Type: Bug (was: Improvement) > ResetCommand should not wipe out all configs > > > Key: SPARK-31234 > URL: https://issues.apache.org/jira/browse/SPARK-31234 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.0.0 > > > Currently, ResetCommand clear all configurations, including sql configs, > static sql configs and spark context level configs. > for example: > ``` > spark-sql> set xyz=abc; > xyz abc > spark-sql> set; > spark.app.id local-1585055396930 > spark.app.nameSparkSQL::10.242.189.214 > spark.driver.host 10.242.189.214 > spark.driver.port 65094 > spark.executor.id driver > spark.jars > spark.master local[*] > spark.sql.catalogImplementation hive > spark.sql.hive.version1.2.1 > spark.submit.deployMode client > xyz abc > spark-sql> reset; > spark-sql> set; > spark-sql> set spark.sql.hive.version; > spark.sql.hive.version1.2.1 > spark-sql> set spark.app.id; > spark.app.id > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31254) `HiveResult.toHiveString` does not use the current session time zone
[ https://issues.apache.org/jira/browse/SPARK-31254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31254. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28024 [https://github.com/apache/spark/pull/28024] > `HiveResult.toHiveString` does not use the current session time zone > > > Key: SPARK-31254 > URL: https://issues.apache.org/jira/browse/SPARK-31254 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.0.0 > > > Currently, date/timestamp formatters in `HiveResult.toHiveString` are > initialized once on instantiation of the `HiveResult` object, and pick up the > session time zone. If the sessions time zone is changed, the formatters still > use the previous one. > See the discussion there > https://github.com/apache/spark/pull/23391#discussion_r397347820 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31254) `HiveResult.toHiveString` does not use the current session time zone
[ https://issues.apache.org/jira/browse/SPARK-31254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-31254: --- Assignee: Maxim Gekk > `HiveResult.toHiveString` does not use the current session time zone > > > Key: SPARK-31254 > URL: https://issues.apache.org/jira/browse/SPARK-31254 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > > Currently, date/timestamp formatters in `HiveResult.toHiveString` are > initialized once on instantiation of the `HiveResult` object, and pick up the > session time zone. If the sessions time zone is changed, the formatters still > use the previous one. > See the discussion there > https://github.com/apache/spark/pull/23391#discussion_r397347820 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31147) forbid CHAR type in non-Hive-Serde tables
[ https://issues.apache.org/jira/browse/SPARK-31147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-31147: Fix Version/s: (was: 3.1.0) 3.0.0 > forbid CHAR type in non-Hive-Serde tables > - > > Key: SPARK-31147 > URL: https://issues.apache.org/jira/browse/SPARK-31147 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31227) Non-nullable null type should not coerce to nullable type
[ https://issues.apache.org/jira/browse/SPARK-31227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31227. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 27991 [https://github.com/apache/spark/pull/27991] > Non-nullable null type should not coerce to nullable type > - > > Key: SPARK-31227 > URL: https://issues.apache.org/jira/browse/SPARK-31227 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Fix For: 3.0.0 > > > {code} > scala> spark.range(10).selectExpr("array()").printSchema() > root > |-- array(): array (nullable = false) > ||-- element: null (containsNull = false) > scala> spark.range(10).selectExpr("concat(array()) as arr").printSchema() > root > |-- arr: array (nullable = false) > ||-- element: null (containsNull = false) > scala> spark.range(10).selectExpr("concat(array(), array(1)) as > arr").printSchema() > root > |-- arr: array (nullable = false) > ||-- element: integer (containsNull = true) > {code} > The last case should not coerce to nullable type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31227) Non-nullable null type should not coerce to nullable type
[ https://issues.apache.org/jira/browse/SPARK-31227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-31227: --- Assignee: Hyukjin Kwon > Non-nullable null type should not coerce to nullable type > - > > Key: SPARK-31227 > URL: https://issues.apache.org/jira/browse/SPARK-31227 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > > {code} > scala> spark.range(10).selectExpr("array()").printSchema() > root > |-- array(): array (nullable = false) > ||-- element: null (containsNull = false) > scala> spark.range(10).selectExpr("concat(array()) as arr").printSchema() > root > |-- arr: array (nullable = false) > ||-- element: null (containsNull = false) > scala> spark.range(10).selectExpr("concat(array(), array(1)) as > arr").printSchema() > root > |-- arr: array (nullable = false) > ||-- element: integer (containsNull = true) > {code} > The last case should not coerce to nullable type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31234) ResetCommand should not wipe out all configs
[ https://issues.apache.org/jira/browse/SPARK-31234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31234. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28003 [https://github.com/apache/spark/pull/28003] > ResetCommand should not wipe out all configs > > > Key: SPARK-31234 > URL: https://issues.apache.org/jira/browse/SPARK-31234 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.0.0 > > > Currently, ResetCommand clear all configurations, including sql configs, > static sql configs and spark context level configs. > for example: > ``` > spark-sql> set xyz=abc; > xyz abc > spark-sql> set; > spark.app.id local-1585055396930 > spark.app.nameSparkSQL::10.242.189.214 > spark.driver.host 10.242.189.214 > spark.driver.port 65094 > spark.executor.id driver > spark.jars > spark.master local[*] > spark.sql.catalogImplementation hive > spark.sql.hive.version1.2.1 > spark.submit.deployMode client > xyz abc > spark-sql> reset; > spark-sql> set; > spark-sql> set spark.sql.hive.version; > spark.sql.hive.version1.2.1 > spark-sql> set spark.app.id; > spark.app.id > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31234) ResetCommand should not wipe out all configs
[ https://issues.apache.org/jira/browse/SPARK-31234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-31234: --- Assignee: Kent Yao > ResetCommand should not wipe out all configs > > > Key: SPARK-31234 > URL: https://issues.apache.org/jira/browse/SPARK-31234 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > > Currently, ResetCommand clear all configurations, including sql configs, > static sql configs and spark context level configs. > for example: > ``` > spark-sql> set xyz=abc; > xyz abc > spark-sql> set; > spark.app.id local-1585055396930 > spark.app.nameSparkSQL::10.242.189.214 > spark.driver.host 10.242.189.214 > spark.driver.port 65094 > spark.executor.id driver > spark.jars > spark.master local[*] > spark.sql.catalogImplementation hive > spark.sql.hive.version1.2.1 > spark.submit.deployMode client > xyz abc > spark-sql> reset; > spark-sql> set; > spark-sql> set spark.sql.hive.version; > spark.sql.hive.version1.2.1 > spark-sql> set spark.app.id; > spark.app.id > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31247) Flaky test: KafkaContinuousSourceSuite.assign from latest offsets (failOnDataLoss: false)
[ https://issues.apache.org/jira/browse/SPARK-31247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-31247: - Description: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120336/testReport/ {code} Error Message org.scalatest.exceptions.TestFailedException: Error adding data: Timeout after waiting for 1 ms. org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:78) org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:30) org.apache.spark.sql.kafka010.KafkaTestUtils.$anonfun$sendMessages$3(KafkaTestUtils.scala:425) scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) scala.collection.TraversableLike.map(TraversableLike.scala:238) scala.collection.TraversableLike.map$(TraversableLike.scala:231) scala.collection.AbstractTraversable.map(Traversable.scala:108) == Progress ==AssertOnQuery(, )AddKafkaData(topics = Set(topic-13), data = WrappedArray(1, 2, 3), message = )CheckAnswer: [2],[3],[4]StopStream StartStream(ContinuousTrigger(1000),org.apache.spark.util.SystemClock@1f1a9495,Map(),null) CheckAnswer: [2],[3],[4]StopStreamAddKafkaData(topics = Set(topic-13), data = WrappedArray(4, 5, 6), message = ) StartStream(ContinuousTrigger(1000),org.apache.spark.util.SystemClock@2b3bec2c,Map(),null) CheckAnswer: [2],[3],[4],[5],[6],[7] => AddKafkaData(topics = Set(topic-13), data = WrappedArray(7, 8), message = )CheckAnswer: [2],[3],[4],[5],[6],[7],[8],[9]AssertOnQuery(, Add partitions) AddKafkaData(topics = Set(topic-13), data = WrappedArray(9, 10, 11, 12, 13, 14, 15, 16), message = )CheckAnswer: [2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17] == Stream == Output Mode: Append Stream state: {KafkaSource[Assign[topic-13-4, topic-13-3, topic-13-2, topic-13-1, topic-13-0]]: {"topic-13":{"2":2,"4":2,"1":1,"3":1,"0":1}}} Thread state: alive Thread stack trace: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:242) scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:258) scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:187) org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:336) org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:746) org.apache.spark.SparkContext.runJob(SparkContext.scala:2104) org.apache.spark.SparkContext.runJob(SparkContext.scala:2125) org.apache.spark.SparkContext.runJob(SparkContext.scala:2144) org.apache.spark.SparkContext.runJob(SparkContext.scala:2169) org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1006) org.apache.spark.rdd.RDD$$Lambda$2999/724038556.apply(Unknown Source) org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) org.apache.spark.rdd.RDD.withScope(RDD.scala:390) org.apache.spark.rdd.RDD.collect(RDD.scala:1005) org.apache.spark.sql.execution.streaming.continuous.WriteToContinuousDataSourceExec.doExecute(WriteToContinuousDataSourceExec.scala:57) org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175) org.apache.spark.sql.execution.SparkPlan$$Lambda$2791/4135277.apply(Unknown Source) org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213) org.apache.spark.sql.execution.SparkPlan$$Lambda$2823/504830038.apply(Unknown Source) org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210) org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171) org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution.$anonfun$runContinuous$4(ContinuousExecution.scala:256) org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution$$Lambda$2765/297007729.apply(Unknown Source) org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100) org.apache.spark.sql.execution.SQLExecution$$$Lambda$2773/697863343.apply(Unknown Source) org.apache.spark.sql.execution.SQLExecution$.withSQLCon
[jira] [Updated] (SPARK-31252) Flaky test: ElementTrackingStoreSuite.asynchronous tracking single-fire
[ https://issues.apache.org/jira/browse/SPARK-31252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-31252: - Description: {code} Error Message org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to eventually never returned normally. Attempted 1 times over 230.305107 milliseconds. Last failure message: false did not equal true. Stacktrace sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to eventually never returned normally. Attempted 1 times over 230.305107 milliseconds. Last failure message: false did not equal true. at org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432) at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439) at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391) at org.apache.spark.status.ElementTrackingStoreSuite.eventually(ElementTrackingStoreSuite.scala:31) at org.apache.spark.status.ElementTrackingStoreSuite.$anonfun$new$1(ElementTrackingStoreSuite.scala:64) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151) at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286) at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:58) at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221) at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214) at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:58) at org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229) at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393) at scala.collection.immutable.List.foreach(List.scala:392) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381) at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458) at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229) at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228) at org.scalatest.FunSuite.runTests(FunSuite.scala:1560) at org.scalatest.Suite.run(Suite.scala:1124) at org.scalatest.Suite.run$(Suite.scala:1106) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560) at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233) at org.scalatest.SuperEngine.runImpl(Engine.scala:518) at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233) at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:58) at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) at sbt.ForkMain$Run$2.call(ForkMain.java:296) at sbt.ForkMain$Run$2.call(ForkMain.java:286) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: false did not equal true at org.scalatest.MatchersHelper$.indicateFailure(MatchersHelper.scala:343) at org.scalatest.Matchers$AnyShouldWrapper.shouldEqual(Matchers.scala:6797) at org.apache.spark.status.ElementTrackingStoreSuite.$anonfun$new$3(ElementTrackingStoreSuite.scala:65) at org.scalatest.concurrent.Eventually.makeAValiantAttempt$1(Eventually.scala:395)
[jira] [Updated] (SPARK-31252) Flaky test: ElementTrackingStoreSuite.asynchronous tracking single-fire
[ https://issues.apache.org/jira/browse/SPARK-31252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-31252: - Description: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120353/testReport {code} Error Message org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to eventually never returned normally. Attempted 1 times over 230.305107 milliseconds. Last failure message: false did not equal true. Stacktrace sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to eventually never returned normally. Attempted 1 times over 230.305107 milliseconds. Last failure message: false did not equal true. at org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432) at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439) at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391) at org.apache.spark.status.ElementTrackingStoreSuite.eventually(ElementTrackingStoreSuite.scala:31) at org.apache.spark.status.ElementTrackingStoreSuite.$anonfun$new$1(ElementTrackingStoreSuite.scala:64) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151) at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286) at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:58) at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221) at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214) at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:58) at org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229) at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393) at scala.collection.immutable.List.foreach(List.scala:392) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381) at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458) at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229) at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228) at org.scalatest.FunSuite.runTests(FunSuite.scala:1560) at org.scalatest.Suite.run(Suite.scala:1124) at org.scalatest.Suite.run$(Suite.scala:1106) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560) at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233) at org.scalatest.SuperEngine.runImpl(Engine.scala:518) at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233) at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:58) at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) at sbt.ForkMain$Run$2.call(ForkMain.java:296) at sbt.ForkMain$Run$2.call(ForkMain.java:286) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: false did not equal true at org.scalatest.MatchersHelper$.indicateFailure(MatchersHelper.scala:343) at org.scalatest.Matchers$AnyShouldWrapper.shouldEqual(Matchers.scala:6797) at org.apache.spark.status.ElementTrackingStoreSuite.$anonfun$new$3(ElementTrackingStoreSuite.scala:65)