[jira] [Updated] (SPARK-48667) Arrow python UDFS didn't support UDT as output type
[ https://issues.apache.org/jira/browse/SPARK-48667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-48667: -- Description: {code:java} df.select(udf(lambda x: x, returnType=ExamplePointUDT(), useArrow=useArrow)("point")), {code} {code:java} java.lang.AssertionError: assertion failed: Invalid schema from pandas_udf: expected org.apache.spark.sql.test.ExamplePointUDT@49ccc723, StructType(StructField(st,StructType(StructField(tt,TimestampType,true)),true)), got ArrayType(DoubleType,false) {code} was: {code:java} df.select(udf(lambda x: x, returnType=ExamplePointUDT(), useArrow=useArrow)("point")), {code} {code:java} java.lang.AssertionError: assertion failed: Invalid schema from pandas_udf: expected BooleanType, LongType, StringType, StringType, DateType, TimestampType, DayTimeIntervalType(0,3), DoubleType, ArrayType(LongType,true), BinaryType, StructType(StructField(x,LongType,true)), org.apache.spark.sql.test.ExamplePointUDT@49ccc723, StructType(StructField(st,StructType(StructField(tt,TimestampType,true)),true)), got BooleanType, LongType, StringType, StringType, DateType, TimestampType, DayTimeIntervalType(0,3), DoubleType, ArrayType(LongType,true), BinaryType, StructType(StructField(x,LongType,true)), ArrayType(DoubleType,false), StructType(StructField(st,StructType(StructField(tt,TimestampType,true)),true)) {code} > Arrow python UDFS didn't support UDT as output type > --- > > Key: SPARK-48667 > URL: https://issues.apache.org/jira/browse/SPARK-48667 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.1, 3.4.3 >Reporter: angerszhu >Priority: Major > > {code:java} > df.select(udf(lambda x: x, returnType=ExamplePointUDT(), > useArrow=useArrow)("point")), {code} > > {code:java} > java.lang.AssertionError: assertion failed: Invalid schema from pandas_udf: > expected org.apache.spark.sql.test.ExamplePointUDT@49ccc723, > StructType(StructField(st,StructType(StructField(tt,TimestampType,true)),true)), > got ArrayType(DoubleType,false) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48667) Arrow python UDFS didn't support UDT as output type
[ https://issues.apache.org/jira/browse/SPARK-48667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-48667: -- Description: {code:java} df.select(udf(lambda x: x, returnType=ExamplePointUDT(), useArrow=useArrow)("point")), {code} {code:java} java.lang.AssertionError: assertion failed: Invalid schema from pandas_udf: expected BooleanType, LongType, StringType, StringType, DateType, TimestampType, DayTimeIntervalType(0,3), DoubleType, ArrayType(LongType,true), BinaryType, StructType(StructField(x,LongType,true)), org.apache.spark.sql.test.ExamplePointUDT@49ccc723, StructType(StructField(st,StructType(StructField(tt,TimestampType,true)),true)), got BooleanType, LongType, StringType, StringType, DateType, TimestampType, DayTimeIntervalType(0,3), DoubleType, ArrayType(LongType,true), BinaryType, StructType(StructField(x,LongType,true)), ArrayType(DoubleType,false), StructType(StructField(st,StructType(StructField(tt,TimestampType,true)),true)) {code} > Arrow python UDFS didn't support UDT as output type > --- > > Key: SPARK-48667 > URL: https://issues.apache.org/jira/browse/SPARK-48667 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.1, 3.4.3 >Reporter: angerszhu >Priority: Major > > {code:java} > df.select(udf(lambda x: x, returnType=ExamplePointUDT(), > useArrow=useArrow)("point")), {code} > > {code:java} > java.lang.AssertionError: assertion failed: Invalid schema from pandas_udf: > expected BooleanType, LongType, StringType, StringType, DateType, > TimestampType, DayTimeIntervalType(0,3), DoubleType, > ArrayType(LongType,true), BinaryType, > StructType(StructField(x,LongType,true)), > org.apache.spark.sql.test.ExamplePointUDT@49ccc723, > StructType(StructField(st,StructType(StructField(tt,TimestampType,true)),true)), > got BooleanType, LongType, StringType, StringType, DateType, TimestampType, > DayTimeIntervalType(0,3), DoubleType, ArrayType(LongType,true), BinaryType, > StructType(StructField(x,LongType,true)), ArrayType(DoubleType,false), > StructType(StructField(st,StructType(StructField(tt,TimestampType,true)),true)) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48667) Arrow python UDFS didn't support UDT as output type
angerszhu created SPARK-48667: - Summary: Arrow python UDFS didn't support UDT as output type Key: SPARK-48667 URL: https://issues.apache.org/jira/browse/SPARK-48667 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.4.3, 3.5.1 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48292) Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status
[ https://issues.apache.org/jira/browse/SPARK-48292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-48292: -- Summary: Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status (was: Improve stage failure reason message in OutputCommitCoordinator ) > Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage > when committed file not consistent with task status > -- > > Key: SPARK-48292 > URL: https://issues.apache.org/jira/browse/SPARK-48292 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: L. C. Hsieh >Priority: Minor > > When a task attemp fails but it is authorized to do task commit, > OutputCommitCoordinator will make the stage failed with a reason message > which says that task commit success, but actually the driver never knows if a > task commit is successful or not. We should update the reason message to make > it less confused. > See https://github.com/apache/spark/pull/36564#discussion_r1598660630 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48340) Support TimestampNTZ infer schema miss prefer_timestamp_ntz
[ https://issues.apache.org/jira/browse/SPARK-48340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-48340: -- Attachment: image-2024-05-20-18-38-39-769.png > Support TimestampNTZ infer schema miss prefer_timestamp_ntz > > > Key: SPARK-48340 > URL: https://issues.apache.org/jira/browse/SPARK-48340 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0, 3.5.1 >Reporter: angerszhu >Priority: Major > Attachments: image-2024-05-20-18-38-39-769.png > > > !image-2024-05-20-18-38-22-486.png|width=378,height=227! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48340) Support TimestampNTZ infer schema miss prefer_timestamp_ntz
angerszhu created SPARK-48340: - Summary: Support TimestampNTZ infer schema miss prefer_timestamp_ntz Key: SPARK-48340 URL: https://issues.apache.org/jira/browse/SPARK-48340 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.5.1, 4.0.0 Reporter: angerszhu Attachments: image-2024-05-20-18-38-39-769.png !image-2024-05-20-18-38-22-486.png|width=378,height=227! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48340) Support TimestampNTZ infer schema miss prefer_timestamp_ntz
[ https://issues.apache.org/jira/browse/SPARK-48340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-48340: -- Description: !image-2024-05-20-18-38-39-769.png|width=746,height=450! (was: !image-2024-05-20-18-38-22-486.png|width=378,height=227!) > Support TimestampNTZ infer schema miss prefer_timestamp_ntz > > > Key: SPARK-48340 > URL: https://issues.apache.org/jira/browse/SPARK-48340 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0, 3.5.1 >Reporter: angerszhu >Priority: Major > Attachments: image-2024-05-20-18-38-39-769.png > > > !image-2024-05-20-18-38-39-769.png|width=746,height=450! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48265) Infer window group limit batch should do constant folding
[ https://issues.apache.org/jira/browse/SPARK-48265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-48265: -- Description: {code:java} 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: === Result of Batch LocalRelation === GlobalLimit 21 GlobalLimit 21 +- LocalLimit 21 +- LocalLimit 21 ! +- Union false, false +- LocalLimit 21 ! :- LocalLimit 21 +- Project [item_id#647L] ! : +- Project [item_id#647L] +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = BR)) AND isnotnull(grass_region#735)) ! : +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = BR)) AND isnotnull(grass_region#735)) +- Relation db.table[,... 91 more fields] parquet ! : +- Relation db.table[,... 91 more fields] parquet ! +- LocalLimit 21 ! +- Project [item_id#738L] ! +- LocalRelation , [, ... 91 more fields] 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Check Cartesian Products has no effect. 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch RewriteSubquery has no effect. 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch NormalizeFloatingNumbers has no effect. 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch ReplaceUpdateFieldsExpression has no effect. 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Optimize Metadata Only Query has no effect. 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch PartitionPruning has no effect. 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch InjectRuntimeFilter has no effect. 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Pushdown Filters from PartitionPruning has no effect. 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Cleanup filters that cannot be pushed down has no effect. 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Extract Python UDFs has no effect. 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: === Applying Rule org.apache.spark.sql.catalyst.optimizer.EliminateLimits === GlobalLimit 21 GlobalLimit 21 !+- LocalLimit 21 +- LocalLimit least(, ... 2 more fields) ! +- LocalLimit 21 +- Project [item_id#647L] ! +- Project [item_id#647L] +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = BR)) AND isnotnull(grass_region#735)) ! +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = BR)) AND isnotnull(grass_region#735)) +- Relation db.table[,... 91 more fields] parquet ! +- Relation db.table[,... 91 more fields] parquet {code} > Infer window group limit batch should do constant folding > - > > Key: SPARK-48265 > URL: https://issues.apache.org/jira/browse/SPARK-48265 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: angerszhu >Priority: Major > > {code:java} > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: > === Result of Batch LocalRelation === > GlobalLimit 21 > GlobalLimit 21 > +- LocalLimit 21 > +- LocalLimit 21 > ! +- Union false, false > +- > LocalLimit 21 > ! :- LocalLimit 21 > +- > Project [item_id#647L] > ! : +- Project [item_id#647L]
[jira] [Created] (SPARK-48265) Infer window group limit batch should do constant folding
angerszhu created SPARK-48265: - Summary: Infer window group limit batch should do constant folding Key: SPARK-48265 URL: https://issues.apache.org/jira/browse/SPARK-48265 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.1, 4.0.0 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48155) PropagateEmpty relation cause LogicalQueryStage only with broadcast without join then execute failed
[ https://issues.apache.org/jira/browse/SPARK-48155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-48155: -- Description: {code:java} 24/05/07 09:48:55 ERROR [main] PlanChangeLogger: === Applying Rule org.apache.spark.sql.execution.adaptive.AQEPropagateEmptyRelation === Project [date#124, station_name#0, shipment_id#14] +- Filter (status#2L INSET 1, 149, 2, 36, 400, 417, 418, 419, 49, 5, 50, 581 AND station_type#1 IN (3,12)) +- Aggregate [date#124, shipment_id#14], [date#124, shipment_id#14, ... 3 more fields] ! +- Join LeftOuter, ((cast(date#124 as timestamp) >= cast(from_unixtime((ctime#27L - 0), -MM-dd HH:mm:ss, Some(Asia/Singapore)) as timestamp)) AND (cast(date#124 as timestamp) + INTERVAL '-4' DAY <= cast(from_unixtime((ctime#27L - 0), -MM-dd HH:mm:ss, Some(Asia/Singapore)) as timestamp))) ! :- LogicalQueryStage Generate explode(org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@3a191e40), false, [date#124], BroadcastQueryStage 0 ! +- LocalRelation , [shipment_id#14, station_name#5, ... 3 more fields]24/05/07 09:48:55 ERROR [main] Project [date#124, station_name#0, shipment_id#14] +- Filter (status#2L INSET 1, 149, 2, 36, 400, 417, 418, 419, 49, 5, 50, 581 AND station_type#1 IN (3,12)) +- Aggregate [date#124, shipment_id#14], [date#124, shipment_id#14, ... 3 more fields] ! +- Project [date#124, cast(null as string) AS shipment_id#14, ... 4 more fields] ! +- LogicalQueryStage Generate explode(org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@3a191e40), false, [date#124], BroadcastQueryStage 0 {code} {code:java} java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.UnsupportedOperationException: BroadcastExchange does not support the execute() code path.at org.apache.spark.sql.errors.QueryExecutionErrors$.executeCodePathUnsupportedError(QueryExecutionErrors.scala:1652) at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecute(BroadcastExchangeExec.scala:203) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180) at org.apache.spark.sql.execution.adaptive.QueryStageExec.doExecute(QueryStageExec.scala:119) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180) at org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:526) at org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:454) at org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:453) at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:497) at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:50) at org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:132) at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:750) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180) at org.apache.spark.sql.execution.aggregate.SortAggregateExec.doExecute(SortAggregateExec.scala:55) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD$lzycompute(ShuffleExchangeExec.scala:144) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD(ShuffleExchangeExec.scala:144) at org.apache.spark.sql.execution.exchange.ShuffleExchangeE
[jira] [Created] (SPARK-48155) PropagateEmpty relation cause LogicalQueryStage only with broadcast without join then execute failed
angerszhu created SPARK-48155: - Summary: PropagateEmpty relation cause LogicalQueryStage only with broadcast without join then execute failed Key: SPARK-48155 URL: https://issues.apache.org/jira/browse/SPARK-48155 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.4, 3.5.1, 3.2.1 Reporter: angerszhu {code:java} 24/05/07 09:48:55 ERROR [main] PlanChangeLogger: === Applying Rule org.apache.spark.sql.execution.adaptive.AQEPropagateEmptyRelation === Project [date#124, station_name#0, shipment_id#14] +- Filter (status#2L INSET 1, 149, 2, 36, 400, 417, 418, 419, 49, 5, 50, 581 AND station_type#1 IN (3,12)) +- Aggregate [date#124, shipment_id#14], [date#124, shipment_id#14, ... 3 more fields] ! +- Join LeftOuter, ((cast(date#124 as timestamp) >= cast(from_unixtime((ctime#27L - 0), -MM-dd HH:mm:ss, Some(Asia/Singapore)) as timestamp)) AND (cast(date#124 as timestamp) + INTERVAL '-4' DAY <= cast(from_unixtime((ctime#27L - 0), -MM-dd HH:mm:ss, Some(Asia/Singapore)) as timestamp))) ! :- LogicalQueryStage Generate explode(org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@3a191e40), false, [date#124], BroadcastQueryStage 0 ! +- LocalRelation , [shipment_id#14, station_name#5, ... 3 more fields]24/05/07 09:48:55 ERROR [main] PlanChangeLogger: === Applying Rule org.apache.spark.sql.execution.adaptive.AQEPropagateEmptyRelation === Project [date#124, station_name#0, shipment_id#14] +- Filter (status#2L INSET 1, 149, 2, 36, 400, 417, 418, 419, 49, 5, 50, 581 AND station_type#1 IN (3,12)) +- Aggregate [date#124, shipment_id#14], [date#124, shipment_id#14, ... 3 more fields] ! +- Project [date#124, cast(null as string) AS shipment_id#14, ... 4 more fields] ! +- LogicalQueryStage Generate explode(org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@3a191e40), false, [date#124], BroadcastQueryStage 0 {code} {code:java} java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.UnsupportedOperationException: BroadcastExchange does not support the execute() code path.at org.apache.spark.sql.errors.QueryExecutionErrors$.executeCodePathUnsupportedError(QueryExecutionErrors.scala:1652) at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecute(BroadcastExchangeExec.scala:203) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180) at org.apache.spark.sql.execution.adaptive.QueryStageExec.doExecute(QueryStageExec.scala:119) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180) at org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:526) at org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:454) at org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:453) at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:497) at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:50) at org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:132) at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:750) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180) at org.apache.spark.sql.execution.aggregate.SortAggregateExec.doExecute(SortAggregateExec.scala:55) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) at org.apache.spark.s
[jira] [Updated] (SPARK-48027) InjectRuntimeFilter for multi-level join should check child join type
[ https://issues.apache.org/jira/browse/SPARK-48027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-48027: -- Description: {code:java} with refund_info as ( select loan_id, 1 as refund_type from default.table_b where grass_date = '2024-04-25' ), next_month_time as ( select /*+ broadcast(b, c) */ loan_id ,1 as final_repayment_time FROM default.table_c where grass_date = '2024-04-25' ) select a.loan_id ,c.final_repayment_time ,b.refund_type from (select loan_id from default.table_a2 where grass_date = '2024-04-25' select loan_id from default.table_a1 where grass_date = '2024-04-24' ) a left join refund_info b on a.loan_id = b.loan_id left join next_month_time c on a.loan_id = c.loan_id ; {code} !image-2024-04-28-16-38-37-510.png|width=899,height=201! In this query, it inject table_b as table_c's runtime filter, but table_b join condition is LEFT OUTER, causing table_c missing data. Caused by InjectRuntimeFilter.extractSelectiveFilterOverScan(), when handle join, since left plan is a UNION< result is NONE, then zip l/r keys to extract from right. Then cause this issue !image-2024-04-28-16-41-08-392.png|width=883,height=706! was: {code:java} with refund_info as ( select loan_id, 1 as refund_type from credit.table_b where grass_date = '2024-04-25' ), next_month_time as ( select /*+ broadcast(b, c) */ loan_id ,1 as final_repayment_time FROM credit.table_c where grass_date = '2024-04-25' ) select a.loan_id ,c.final_repayment_time ,b.refund_type from (select loan_id from credit_fund.table_a2 where grass_date = '2024-04-25' --当天新增卖出的loan union all select loan_id from credit_fund.table_a1 where grass_date = '2024-04-24' and loan_abs_status != 600 --历史累计cutoff的loan, 过滤掉cutoff 后再次 revoke loan (状态流转 100 -> 600) ) a left join refund_info b on a.loan_id = b.loan_id left join next_month_time c on a.loan_id = c.loan_id ; {code} !image-2024-04-28-16-38-37-510.png|width=899,height=201! In this query, it inject table_b as table_c's runtime filter, but table_b join condition is LEFT OUTER, causing table_c missing data. Caused by InjectRuntimeFilter.extractSelectiveFilterOverScan(), when handle join, since left plan is a UNION< result is NONE, then zip l/r keys to extract from right. Then cause this issue !image-2024-04-28-16-41-08-392.png|width=883,height=706! > InjectRuntimeFilter for multi-level join should check child join type > - > > Key: SPARK-48027 > URL: https://issues.apache.org/jira/browse/SPARK-48027 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1, 3.4.3 >Reporter: angerszhu >Priority: Major > Attachments: image-2024-04-28-16-38-37-510.png, > image-2024-04-28-16-41-08-392.png > > > {code:java} > with > refund_info as ( > select > loan_id, > 1 as refund_type > from > default.table_b > where grass_date = '2024-04-25' > > ), > next_month_time as ( > select /*+ broadcast(b, c) */ > loan_id > ,1 as final_repayment_time > FROM default.table_c > where grass_date = '2024-04-25' > ) > select > a.loan_id > ,c.final_repayment_time > ,b.refund_type from > (select > loan_id > from > default.table_a2 > where grass_date = '2024-04-25' > select > loan_id > from > default.table_a1 > where grass_date = '2024-04-24' > ) a > left join > refund_info b > on a.loan_id = b.loan_id > left join > next_month_time c > on a.loan_id = c.loan_id > ; > {code} > !image-2024-04-28-16-38-37-510.png|width=899,height=201! > > In this query, it inject table_b as table_c's runtime filter, but table_b > join condition is LEFT OUTER, causing table_c missing data. > Caused by > InjectRuntimeFilter.extractSelectiveFilterOverScan(), when handle join, since > left plan is a UNION< result is NONE, then zip l/r keys to extract from > right. Then cause this issue > !image-2024-04-28-16-41-08-392.png|width=883,height=706! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional comma
[jira] [Updated] (SPARK-48027) InjectRuntimeFilter for multi-level join should check child join type
[ https://issues.apache.org/jira/browse/SPARK-48027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-48027: -- Description: {code:java} with refund_info as ( select loan_id, 1 as refund_type from credit.table_b where grass_date = '2024-04-25' ), next_month_time as ( select /*+ broadcast(b, c) */ loan_id ,1 as final_repayment_time FROM credit.table_c where grass_date = '2024-04-25' ) select a.loan_id ,c.final_repayment_time ,b.refund_type from (select loan_id from credit_fund.table_a2 where grass_date = '2024-04-25' --当天新增卖出的loan union all select loan_id from credit_fund.table_a1 where grass_date = '2024-04-24' and loan_abs_status != 600 --历史累计cutoff的loan, 过滤掉cutoff 后再次 revoke loan (状态流转 100 -> 600) ) a left join refund_info b on a.loan_id = b.loan_id left join next_month_time c on a.loan_id = c.loan_id ; {code} !image-2024-04-28-16-38-37-510.png|width=899,height=201! In this query, it inject table_b as table_c's runtime filter, but table_b join condition is LEFT OUTER, causing table_c missing data. Caused by InjectRuntimeFilter.extractSelectiveFilterOverScan(), when handle join, since left plan is a UNION< result is NONE, then zip l/r keys to extract from right. Then cause this issue !image-2024-04-28-16-41-08-392.png|width=883,height=706! was: {code:java} with refund_info as ( select loan_id, 1 as refund_type from credit.table_b where grass_date = '2024-04-25' ), next_month_time as ( select /*+ broadcast(b, c) */ loan_id ,1 as final_repayment_time FROM credit.table_c where grass_date = '2024-04-25' ) select a.loan_id ,c.final_repayment_time ,b.refund_type from (select loan_id from credit_fund.table_a2 where grass_date = '2024-04-25' --当天新增卖出的loan union all select loan_id from credit_fund.table_a1 where grass_date = '2024-04-24' and loan_abs_status != 600 --历史累计cutoff的loan, 过滤掉cutoff 后再次 revoke loan (状态流转 100 -> 600) ) a left join refund_info b on a.loan_id = b.loan_id left join next_month_time c on a.loan_id = c.loan_id ; {code} !image-2024-04-28-16-38-37-510.png|width=899,height=201! > InjectRuntimeFilter for multi-level join should check child join type > - > > Key: SPARK-48027 > URL: https://issues.apache.org/jira/browse/SPARK-48027 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1, 3.4.3 >Reporter: angerszhu >Priority: Major > Attachments: image-2024-04-28-16-38-37-510.png, > image-2024-04-28-16-41-08-392.png > > > {code:java} > with > refund_info as ( > select > loan_id, > 1 as refund_type > from > credit.table_b > where grass_date = '2024-04-25' > > ), > next_month_time as ( > select /*+ broadcast(b, c) */ > loan_id > ,1 as final_repayment_time > FROM credit.table_c > where grass_date = '2024-04-25' > ) > select > a.loan_id > ,c.final_repayment_time > ,b.refund_type from > (select > loan_id > from > credit_fund.table_a2 > where grass_date = '2024-04-25' --当天新增卖出的loan union all > select > loan_id > from > credit_fund.table_a1 > where grass_date = '2024-04-24' and loan_abs_status != 600 > --历史累计cutoff的loan, 过滤掉cutoff 后再次 revoke loan (状态流转 100 -> 600) > ) a > left join > refund_info b > on a.loan_id = b.loan_id > left join > next_month_time c > on a.loan_id = c.loan_id > ; > {code} > !image-2024-04-28-16-38-37-510.png|width=899,height=201! > > In this query, it inject table_b as table_c's runtime filter, but table_b > join condition is LEFT OUTER, causing table_c missing data. > Caused by > InjectRuntimeFilter.extractSelectiveFilterOverScan(), when handle join, since > left plan is a UNION< result is NONE, then zip l/r keys to extract from > right. Then cause this issue > !image-2024-04-28-16-41-08-392.png|width=883,height=706! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48027) InjectRuntimeFilter for multi-level join should check child join type
[ https://issues.apache.org/jira/browse/SPARK-48027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-48027: -- Attachment: image-2024-04-28-16-41-08-392.png > InjectRuntimeFilter for multi-level join should check child join type > - > > Key: SPARK-48027 > URL: https://issues.apache.org/jira/browse/SPARK-48027 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1, 3.4.3 >Reporter: angerszhu >Priority: Major > Attachments: image-2024-04-28-16-38-37-510.png, > image-2024-04-28-16-41-08-392.png > > > {code:java} > with > refund_info as ( > select > loan_id, > 1 as refund_type > from > credit.table_b > where grass_date = '2024-04-25' > > ), > next_month_time as ( > select /*+ broadcast(b, c) */ > loan_id > ,1 as final_repayment_time > FROM credit.table_c > where grass_date = '2024-04-25' > ) > select > a.loan_id > ,c.final_repayment_time > ,b.refund_type from > (select > loan_id > from > credit_fund.table_a2 > where grass_date = '2024-04-25' --当天新增卖出的loan union all > select > loan_id > from > credit_fund.table_a1 > where grass_date = '2024-04-24' and loan_abs_status != 600 > --历史累计cutoff的loan, 过滤掉cutoff 后再次 revoke loan (状态流转 100 -> 600) > ) a > left join > refund_info b > on a.loan_id = b.loan_id > left join > next_month_time c > on a.loan_id = c.loan_id > ; > {code} > !image-2024-04-28-16-38-37-510.png|width=899,height=201! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48027) InjectRuntimeFilter for multi-level join should check child join type
[ https://issues.apache.org/jira/browse/SPARK-48027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-48027: -- Description: {code:java} with refund_info as ( select loan_id, 1 as refund_type from credit.table_b where grass_date = '2024-04-25' ), next_month_time as ( select /*+ broadcast(b, c) */ loan_id ,1 as final_repayment_time FROM credit.table_c where grass_date = '2024-04-25' ) select a.loan_id ,c.final_repayment_time ,b.refund_type from (select loan_id from credit_fund.table_a2 where grass_date = '2024-04-25' --当天新增卖出的loan union all select loan_id from credit_fund.table_a1 where grass_date = '2024-04-24' and loan_abs_status != 600 --历史累计cutoff的loan, 过滤掉cutoff 后再次 revoke loan (状态流转 100 -> 600) ) a left join refund_info b on a.loan_id = b.loan_id left join next_month_time c on a.loan_id = c.loan_id ; {code} !image-2024-04-28-16-38-37-510.png|width=899,height=201! > InjectRuntimeFilter for multi-level join should check child join type > - > > Key: SPARK-48027 > URL: https://issues.apache.org/jira/browse/SPARK-48027 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1, 3.4.3 >Reporter: angerszhu >Priority: Major > Attachments: image-2024-04-28-16-38-37-510.png > > > {code:java} > with > refund_info as ( > select > loan_id, > 1 as refund_type > from > credit.table_b > where grass_date = '2024-04-25' > > ), > next_month_time as ( > select /*+ broadcast(b, c) */ > loan_id > ,1 as final_repayment_time > FROM credit.table_c > where grass_date = '2024-04-25' > ) > select > a.loan_id > ,c.final_repayment_time > ,b.refund_type from > (select > loan_id > from > credit_fund.table_a2 > where grass_date = '2024-04-25' --当天新增卖出的loan union all > select > loan_id > from > credit_fund.table_a1 > where grass_date = '2024-04-24' and loan_abs_status != 600 > --历史累计cutoff的loan, 过滤掉cutoff 后再次 revoke loan (状态流转 100 -> 600) > ) a > left join > refund_info b > on a.loan_id = b.loan_id > left join > next_month_time c > on a.loan_id = c.loan_id > ; > {code} > !image-2024-04-28-16-38-37-510.png|width=899,height=201! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48027) InjectRuntimeFilter for multi-level join should check child join type
[ https://issues.apache.org/jira/browse/SPARK-48027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-48027: -- Attachment: image-2024-04-28-16-38-37-510.png > InjectRuntimeFilter for multi-level join should check child join type > - > > Key: SPARK-48027 > URL: https://issues.apache.org/jira/browse/SPARK-48027 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1, 3.4.3 >Reporter: angerszhu >Priority: Major > Attachments: image-2024-04-28-16-38-37-510.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48027) InjectRuntimeFilter for multi-level join should check child join type
angerszhu created SPARK-48027: - Summary: InjectRuntimeFilter for multi-level join should check child join type Key: SPARK-48027 URL: https://issues.apache.org/jira/browse/SPARK-48027 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.3, 3.5.1, 4.0.0 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33826) InsertIntoHiveTable generate HDFS file with invalid user
[ https://issues.apache.org/jira/browse/SPARK-33826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17839449#comment-17839449 ] angerszhu commented on SPARK-33826: --- What is RIK? Not sure what do you mean > InsertIntoHiveTable generate HDFS file with invalid user > > > Key: SPARK-33826 > URL: https://issues.apache.org/jira/browse/SPARK-33826 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2, 3.0.0 >Reporter: Zhang Jianguo >Priority: Minor > > *Arch:* Hive on Spark. > > *Version:* Spark 2.3.2 > > *Conf:* > Enable user impersonation > hive.server2.enable.doAs=true > > *Scenario:* > Thriftserver is running with loginUser A, and Task run as User A too. > Client execute SQL with user B > > Data generated by sql "insert into TABLE \[tbl\] select XXX form ." is > written to HDFS on executor, executor doesn't know B. > > *{color:#de350b}So the user file written to HDFS will be user A which should > be B.{color}* > > I also check the inplementation of Spark 3.0.0, It could have the same issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47294) OptimizeSkewInRebalanceRepartitions should support ProjectExec(_,ShuffleQueryStageExec)
[ https://issues.apache.org/jira/browse/SPARK-47294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu resolved SPARK-47294. --- Resolution: Not A Problem > OptimizeSkewInRebalanceRepartitions should support > ProjectExec(_,ShuffleQueryStageExec) > --- > > Key: SPARK-47294 > URL: https://issues.apache.org/jira/browse/SPARK-47294 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: angerszhu >Priority: Major > Labels: pull-request-available > > Current OptimizeSkewInRebalanceRepartitions only support match > ShuffleQueryStageExec, this case only support SQL query, can't work when > insert since there have a project between ShuffleQueryStageExec and insert > command > {code:java} > plan transformUp { > case p @ ProjectExec(_, stage: ShuffleQueryStageExec) if > isSupported(stage.shuffle) => > p.copy(child = tryOptimizeSkewedPartitions(stage)) > case stage: ShuffleQueryStageExec if isSupported(stage.shuffle) => > tryOptimizeSkewedPartitions(stage) > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47294) OptimizeSkewInRebalanceRepartitions should support ProjectExec(_,ShuffleQueryStageExec)
[ https://issues.apache.org/jira/browse/SPARK-47294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-47294: -- Description: Current OptimizeSkewInRebalanceRepartitions only support match ShuffleQueryStageExec, this case only support SQL query, can't work when insert since there have a project between ShuffleQueryStageExec and insert command {code:java} plan transformUp { case p @ ProjectExec(_, stage: ShuffleQueryStageExec) if isSupported(stage.shuffle) => p.copy(child = tryOptimizeSkewedPartitions(stage)) case stage: ShuffleQueryStageExec if isSupported(stage.shuffle) => tryOptimizeSkewedPartitions(stage) } {code} > OptimizeSkewInRebalanceRepartitions should support > ProjectExec(_,ShuffleQueryStageExec) > --- > > Key: SPARK-47294 > URL: https://issues.apache.org/jira/browse/SPARK-47294 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: angerszhu >Priority: Major > > Current OptimizeSkewInRebalanceRepartitions only support match > ShuffleQueryStageExec, this case only support SQL query, can't work when > insert since there have a project between ShuffleQueryStageExec and insert > command > {code:java} > plan transformUp { > case p @ ProjectExec(_, stage: ShuffleQueryStageExec) if > isSupported(stage.shuffle) => > p.copy(child = tryOptimizeSkewedPartitions(stage)) > case stage: ShuffleQueryStageExec if isSupported(stage.shuffle) => > tryOptimizeSkewedPartitions(stage) > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47294) OptimizeSkewInRebalanceRepartitions should support ProjectExec(_,ShuffleQueryStageExec)
angerszhu created SPARK-47294: - Summary: OptimizeSkewInRebalanceRepartitions should support ProjectExec(_,ShuffleQueryStageExec) Key: SPARK-47294 URL: https://issues.apache.org/jira/browse/SPARK-47294 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.1, 4.0.0 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46741) CacheTable AsSelect should inherit from CTEInChildren to make sure it can be matched
[ https://issues.apache.org/jira/browse/SPARK-46741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-46741: -- Description: Current code since CaheTableAsSelelct not inherit CETInChildren, still return forceInline. Make cache result can't be matched !image-2024-01-17-11-48-28-867.png|width=859,height=363! > CacheTable AsSelect should inherit from CTEInChildren to make sure it can be > matched > > > Key: SPARK-46741 > URL: https://issues.apache.org/jira/browse/SPARK-46741 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: angerszhu >Priority: Major > Attachments: image-2024-01-17-11-48-28-867.png > > > Current code since CaheTableAsSelelct not inherit CETInChildren, still > return forceInline. Make cache result can't be matched > !image-2024-01-17-11-48-28-867.png|width=859,height=363! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46741) CacheTable AsSelect should inherit from CTEInChildren to make sure it can be matched
[ https://issues.apache.org/jira/browse/SPARK-46741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-46741: -- Attachment: image-2024-01-17-11-48-28-867.png > CacheTable AsSelect should inherit from CTEInChildren to make sure it can be > matched > > > Key: SPARK-46741 > URL: https://issues.apache.org/jira/browse/SPARK-46741 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: angerszhu >Priority: Major > Attachments: image-2024-01-17-11-48-28-867.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46741) CacheTable AsSelect should inherit from CTEInChildren to make sure it can be matched
angerszhu created SPARK-46741: - Summary: CacheTable AsSelect should inherit from CTEInChildren to make sure it can be matched Key: SPARK-46741 URL: https://issues.apache.org/jira/browse/SPARK-46741 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46034) SparkContext add file should also copy file to local root path
[ https://issues.apache.org/jira/browse/SPARK-46034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-46034: -- Description: For below case, it failed with FileNotFoundException {code:java} add jar hdfs://path/search_hadoop_udf-1.0.0-SNAPSHOT.jar; add file hdfs://path/feature_map.txt; CREATE or replace TEMPORARY FUNCTION si_to_fn AS "com.shopee.deep.data_mart.udf.SlotIdToFeatName"; select si_to_fn(k, './feature_map.txt') as feat_name from ( select 'slot_8116' as k union all select 'slot_2219' as k) A; {code} It was called that user use valued-table then the task will running on driver, but driver didn't copy this file to the root path, then failed. > SparkContext add file should also copy file to local root path > -- > > Key: SPARK-46034 > URL: https://issues.apache.org/jira/browse/SPARK-46034 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: angerszhu >Priority: Major > > For below case, it failed with FileNotFoundException > {code:java} > add jar hdfs://path/search_hadoop_udf-1.0.0-SNAPSHOT.jar; > add file hdfs://path/feature_map.txt; > CREATE or replace TEMPORARY FUNCTION si_to_fn AS > "com.shopee.deep.data_mart.udf.SlotIdToFeatName"; > select si_to_fn(k, './feature_map.txt') as feat_name > from ( > select 'slot_8116' as k > union all > select 'slot_2219' as k) > A; {code} > It was called that user use valued-table then the task will running on > driver, but driver didn't copy this file to the root path, then failed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46034) SparkContext add file should also copy file to local root path
angerszhu created SPARK-46034: - Summary: SparkContext add file should also copy file to local root path Key: SPARK-46034 URL: https://issues.apache.org/jira/browse/SPARK-46034 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.2.1 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46006) YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop
[ https://issues.apache.org/jira/browse/SPARK-46006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-46006: -- Description: We meet a case that user call sc.stop() after run all custom code, but stuck in some place. Cause below situation # User call sc.stop() # sc.stop() stuck in some process, but SchedulerBackend.stop was called # Since tarn ApplicationMaster didn't finish, still call YarnAllocator.allocateResources() # Since driver endpoint stop new allocated executor failed to register # untll trigger Max number of executor failures Caused by Before call CoarseGrainedSchedulerBackend.stop() will call YarnSchedulerBackend.requestTotalExecutor() to clean request info !image-2023-11-20-17-56-56-507.png|width=898,height=297! >From the log we make sure that CoarseGrainedSchedulerBackend.stop() was called When YarnAllocator handle then empty resource request, since resourceTotalExecutorsWithPreferedLocalities is empty, miss clean targetNumExecutorsPerResourceProfileId. !image-2023-11-20-17-56-45-212.png|width=708,height=379! was: We meet a case that user call sc.stop() after run all custom code, but stuck in some place. Cause below situation # User call sc.stop() # sc.stop() stuck in some process, but SchedulerBackend.stop was called # Since tarn ApplicationMaster didn't finish, still call YarnAllocator.allocateResources() 因为driver端口已经关闭,allocateResource还在继续申请新的executor失败 触发 Max number of executor failures > YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after > YarnSchedulerBackend call stop > > > Key: SPARK-46006 > URL: https://issues.apache.org/jira/browse/SPARK-46006 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 3.1.3, 3.2.4, 3.3.2, 3.4.1, 3.5.0 >Reporter: angerszhu >Priority: Major > Fix For: 3.4.2, 4.0.0, 3.5.1 > > Attachments: image-2023-11-20-17-56-45-212.png, > image-2023-11-20-17-56-56-507.png > > > We meet a case that user call sc.stop() after run all custom code, but stuck > in some place. > Cause below situation > # User call sc.stop() > # sc.stop() stuck in some process, but SchedulerBackend.stop was called > # Since tarn ApplicationMaster didn't finish, still call > YarnAllocator.allocateResources() > # Since driver endpoint stop new allocated executor failed to register > # untll trigger Max number of executor failures > Caused by > Before call CoarseGrainedSchedulerBackend.stop() will call > YarnSchedulerBackend.requestTotalExecutor() to clean request info > !image-2023-11-20-17-56-56-507.png|width=898,height=297! > > From the log we make sure that CoarseGrainedSchedulerBackend.stop() was > called > > > When YarnAllocator handle then empty resource request, since > resourceTotalExecutorsWithPreferedLocalities is empty, miss clean > targetNumExecutorsPerResourceProfileId. > !image-2023-11-20-17-56-45-212.png|width=708,height=379! > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46006) YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop
[ https://issues.apache.org/jira/browse/SPARK-46006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-46006: -- Attachment: image-2023-11-20-17-56-56-507.png > YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after > YarnSchedulerBackend call stop > > > Key: SPARK-46006 > URL: https://issues.apache.org/jira/browse/SPARK-46006 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 3.1.3, 3.2.4, 3.3.2, 3.4.1, 3.5.0 >Reporter: angerszhu >Priority: Major > Fix For: 3.4.2, 4.0.0, 3.5.1 > > Attachments: image-2023-11-20-17-56-45-212.png, > image-2023-11-20-17-56-56-507.png > > > We meet a case that user call sc.stop() after run all custom code, but stuck > in some place. > Cause below situation > # User call sc.stop() > # sc.stop() stuck in some process, but SchedulerBackend.stop was called > # Since tarn ApplicationMaster didn't finish, still call > YarnAllocator.allocateResources() > 因为driver端口已经关闭,allocateResource还在继续申请新的executor失败 > 触发 Max number of executor failures -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46006) YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop
[ https://issues.apache.org/jira/browse/SPARK-46006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-46006: -- Attachment: image-2023-11-20-17-56-45-212.png > YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after > YarnSchedulerBackend call stop > > > Key: SPARK-46006 > URL: https://issues.apache.org/jira/browse/SPARK-46006 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 3.1.3, 3.2.4, 3.3.2, 3.4.1, 3.5.0 >Reporter: angerszhu >Priority: Major > Fix For: 3.4.2, 4.0.0, 3.5.1 > > Attachments: image-2023-11-20-17-56-45-212.png, > image-2023-11-20-17-56-56-507.png > > > We meet a case that user call sc.stop() after run all custom code, but stuck > in some place. > Cause below situation > # User call sc.stop() > # sc.stop() stuck in some process, but SchedulerBackend.stop was called > # Since tarn ApplicationMaster didn't finish, still call > YarnAllocator.allocateResources() > 因为driver端口已经关闭,allocateResource还在继续申请新的executor失败 > 触发 Max number of executor failures -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46006) YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop
[ https://issues.apache.org/jira/browse/SPARK-46006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-46006: -- Description: We meet a case that user call sc.stop() after run all custom code, but stuck in some place. Cause below situation # User call sc.stop() # sc.stop() stuck in some process, but SchedulerBackend.stop was called # Since tarn ApplicationMaster didn't finish, still call YarnAllocator.allocateResources() 因为driver端口已经关闭,allocateResource还在继续申请新的executor失败 触发 Max number of executor failures was: We meet a case that user call sc.stop() after run all custom code, but stuck in some place. Cause below situation # User call sc.stop() # sc.stop() stuck in some process, but SchedulerBackend.stop was called ApplicationMaster 还没有进入finish, 还会继续调用YarnAllocator.allocateResources() 因为driver端口已经关闭,allocateResource还在继续申请新的executor失败 触发 Max number of executor failures > YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after > YarnSchedulerBackend call stop > > > Key: SPARK-46006 > URL: https://issues.apache.org/jira/browse/SPARK-46006 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 3.1.3, 3.2.4, 3.3.2, 3.4.1, 3.5.0 >Reporter: angerszhu >Priority: Major > Fix For: 3.4.2, 4.0.0, 3.5.1 > > > We meet a case that user call sc.stop() after run all custom code, but stuck > in some place. > Cause below situation > # User call sc.stop() > # sc.stop() stuck in some process, but SchedulerBackend.stop was called > # Since tarn ApplicationMaster didn't finish, still call > YarnAllocator.allocateResources() > 因为driver端口已经关闭,allocateResource还在继续申请新的executor失败 > 触发 Max number of executor failures -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46006) YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop
[ https://issues.apache.org/jira/browse/SPARK-46006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-46006: -- Description: We meet a case that user call sc.stop() after run all custom code, but stuck in some place. Cause below situation # User call sc.stop() # sc.stop() stuck in some process, but SchedulerBackend.stop was called ApplicationMaster 还没有进入finish, 还会继续调用YarnAllocator.allocateResources() 因为driver端口已经关闭,allocateResource还在继续申请新的executor失败 触发 Max number of executor failures was: We meet a case that user call sc.stop() after run all custom code, but stuck in some place. Cause below situation # User call sc.stop() # sc.stop()卡住 ApplicationMaster 还没有进入finish, 还会继续调用YarnAllocator.allocateResources() 因为driver端口已经关闭,allocateResource还在继续申请新的executor失败 触发 Max number of executor failures > YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after > YarnSchedulerBackend call stop > > > Key: SPARK-46006 > URL: https://issues.apache.org/jira/browse/SPARK-46006 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 3.1.3, 3.2.4, 3.3.2, 3.4.1, 3.5.0 >Reporter: angerszhu >Priority: Major > Fix For: 3.4.2, 4.0.0, 3.5.1 > > > We meet a case that user call sc.stop() after run all custom code, but stuck > in some place. > Cause below situation > # User call sc.stop() > # sc.stop() stuck in some process, but SchedulerBackend.stop was called > ApplicationMaster 还没有进入finish, 还会继续调用YarnAllocator.allocateResources() > 因为driver端口已经关闭,allocateResource还在继续申请新的executor失败 > 触发 Max number of executor failures -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46006) YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop
[ https://issues.apache.org/jira/browse/SPARK-46006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-46006: -- Description: We meet a case that user call sc.stop() after run all custom code, but stuck in some place. Cause below situation # User call sc.stop() # sc.stop()卡住 ApplicationMaster 还没有进入finish, 还会继续调用YarnAllocator.allocateResources() 因为driver端口已经关闭,allocateResource还在继续申请新的executor失败 触发 Max number of executor failures > YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after > YarnSchedulerBackend call stop > > > Key: SPARK-46006 > URL: https://issues.apache.org/jira/browse/SPARK-46006 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 3.1.3, 3.2.4, 3.3.2, 3.4.1, 3.5.0 >Reporter: angerszhu >Priority: Major > Fix For: 3.4.2, 4.0.0, 3.5.1 > > > We meet a case that user call sc.stop() after run all custom code, but stuck > in some place. > Cause below situation > # User call sc.stop() > # > sc.stop()卡住 > ApplicationMaster 还没有进入finish, 还会继续调用YarnAllocator.allocateResources() > 因为driver端口已经关闭,allocateResource还在继续申请新的executor失败 > 触发 Max number of executor failures -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46006) YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop
angerszhu created SPARK-46006: - Summary: YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop Key: SPARK-46006 URL: https://issues.apache.org/jira/browse/SPARK-46006 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 3.5.0, 3.4.1, 3.3.2, 3.2.4, 3.1.3 Reporter: angerszhu Fix For: 3.4.2, 4.0.0, 3.5.1 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43064) Spark SQL CLI SQL tab should only show once statement once
[ https://issues.apache.org/jira/browse/SPARK-43064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-43064: -- Description: !screenshot-1.png|width=996,height=554! (was: !screenshot-1.png! ) > Spark SQL CLI SQL tab should only show once statement once > -- > > Key: SPARK-43064 > URL: https://issues.apache.org/jira/browse/SPARK-43064 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: angerszhu >Priority: Major > Attachments: screenshot-1.png > > > !screenshot-1.png|width=996,height=554! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43064) Spark SQL CLI SQL tab should only show once statement once
[ https://issues.apache.org/jira/browse/SPARK-43064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-43064: -- Description: !screenshot-1.png! > Spark SQL CLI SQL tab should only show once statement once > -- > > Key: SPARK-43064 > URL: https://issues.apache.org/jira/browse/SPARK-43064 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: angerszhu >Priority: Major > Attachments: screenshot-1.png > > > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43064) Spark SQL CLI SQL tab should only show once statement once
[ https://issues.apache.org/jira/browse/SPARK-43064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-43064: -- Attachment: screenshot-1.png > Spark SQL CLI SQL tab should only show once statement once > -- > > Key: SPARK-43064 > URL: https://issues.apache.org/jira/browse/SPARK-43064 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: angerszhu >Priority: Major > Attachments: screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43064) Spark SQL CLI SQL tab should only show once statement once
angerszhu created SPARK-43064: - Summary: Spark SQL CLI SQL tab should only show once statement once Key: SPARK-43064 URL: https://issues.apache.org/jira/browse/SPARK-43064 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42699) SparkConnectServer should make client and AM same exit code
angerszhu created SPARK-42699: - Summary: SparkConnectServer should make client and AM same exit code Key: SPARK-42699 URL: https://issues.apache.org/jira/browse/SPARK-42699 Project: Spark Issue Type: Sub-task Components: Connect, Spark Core Affects Versions: 3.5.0 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42698) Client mode submit task client should keep same exitcode with AM
[ https://issues.apache.org/jira/browse/SPARK-42698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-42698: -- Parent: SPARK-36623 Issue Type: Sub-task (was: Bug) > Client mode submit task client should keep same exitcode with AM > > > Key: SPARK-42698 > URL: https://issues.apache.org/jira/browse/SPARK-42698 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.5.0 >Reporter: angerszhu >Priority: Major > > ``` > try { > app.start(childArgs.toArray, sparkConf) > } catch { > case t: Throwable => > throw findCause(t) > } finally { > if (!isShell(args.primaryResource) && !isSqlShell(args.mainClass) && > !isThriftServer(args.mainClass)) { > try { > SparkContext.getActive.foreach(_.stop()) > } catch { > case e: Throwable => logError(s"Failed to close SparkContext: $e") > } > } > } > } > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42698) Client mode submit task client should keep same exitcode with AM
[ https://issues.apache.org/jira/browse/SPARK-42698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-42698: -- Description: ``` try { app.start(childArgs.toArray, sparkConf) } catch { case t: Throwable => throw findCause(t) } finally { if (!isShell(args.primaryResource) && !isSqlShell(args.mainClass) && !isThriftServer(args.mainClass)) { try { SparkContext.getActive.foreach(_.stop()) } catch { case e: Throwable => logError(s"Failed to close SparkContext: $e") } } } } ``` > Client mode submit task client should keep same exitcode with AM > > > Key: SPARK-42698 > URL: https://issues.apache.org/jira/browse/SPARK-42698 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 3.5.0 >Reporter: angerszhu >Priority: Major > > ``` > try { > app.start(childArgs.toArray, sparkConf) > } catch { > case t: Throwable => > throw findCause(t) > } finally { > if (!isShell(args.primaryResource) && !isSqlShell(args.mainClass) && > !isThriftServer(args.mainClass)) { > try { > SparkContext.getActive.foreach(_.stop()) > } catch { > case e: Throwable => logError(s"Failed to close SparkContext: $e") > } > } > } > } > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42698) Client mode submit task client should keep same exitcode with AM
angerszhu created SPARK-42698: - Summary: Client mode submit task client should keep same exitcode with AM Key: SPARK-42698 URL: https://issues.apache.org/jira/browse/SPARK-42698 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 3.5.0 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40019) Refactor comment of ArrayType
[ https://issues.apache.org/jira/browse/SPARK-40019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-40019: -- Summary: Refactor comment of ArrayType (was: Refactor comment of ArrayType and MapType) > Refactor comment of ArrayType > - > > Key: SPARK-40019 > URL: https://issues.apache.org/jira/browse/SPARK-40019 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0, 3.2.2 >Reporter: angerszhu >Priority: Major > > Now the parameter `containsNull` of ArrayType/MapType is so confused, need to > add comment -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40019) Refactor comment of ArrayType and MapType
angerszhu created SPARK-40019: - Summary: Refactor comment of ArrayType and MapType Key: SPARK-40019 URL: https://issues.apache.org/jira/browse/SPARK-40019 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.2, 3.3.0 Reporter: angerszhu Now the parameter `containsNull` of ArrayType/MapType is so confused, need to add comment -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39776) Join‘ verbose string didn't contains JoinType
[ https://issues.apache.org/jira/browse/SPARK-39776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-39776: -- Description: Current verbose string don't have joinType {code:java} (5) BroadcastHashJoin [codegen id : 8] Left keys [1]: [ss_sold_date_sk#3] Right keys [1]: [d_date_sk#5] Join condition: None {code} {code:java} override def verboseStringWithOperatorId(): String = { val joinCondStr = if (condition.isDefined) { s"${condition.get}" } else "None" if (leftKeys.nonEmpty || rightKeys.nonEmpty) { s""" |$formattedNodeName |${ExplainUtils.generateFieldString("Left keys", leftKeys)} |${ExplainUtils.generateFieldString("Right keys", rightKeys)} |${ExplainUtils.generateFieldString("Join condition", joinCondStr)} |""".stripMargin } else { s""" |$formattedNodeName |${ExplainUtils.generateFieldString("Join condition", joinCondStr)} |""".stripMargin } } {code} was: Current verbose string don't have joinType {code:java} override def verboseStringWithOperatorId(): String = { val joinCondStr = if (condition.isDefined) { s"${condition.get}" } else "None" if (leftKeys.nonEmpty || rightKeys.nonEmpty) { s""" |$formattedNodeName |${ExplainUtils.generateFieldString("Left keys", leftKeys)} |${ExplainUtils.generateFieldString("Right keys", rightKeys)} |${ExplainUtils.generateFieldString("Join condition", joinCondStr)} |""".stripMargin } else { s""" |$formattedNodeName |${ExplainUtils.generateFieldString("Join condition", joinCondStr)} |""".stripMargin } } {code} > Join‘ verbose string didn't contains JoinType > - > > Key: SPARK-39776 > URL: https://issues.apache.org/jira/browse/SPARK-39776 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: angerszhu >Priority: Major > > Current verbose string don't have joinType > {code:java} > (5) BroadcastHashJoin [codegen id : 8] > Left keys [1]: [ss_sold_date_sk#3] > Right keys [1]: [d_date_sk#5] > Join condition: None > {code} > {code:java} > override def verboseStringWithOperatorId(): String = { > val joinCondStr = if (condition.isDefined) { > s"${condition.get}" > } else "None" > if (leftKeys.nonEmpty || rightKeys.nonEmpty) { > s""" > |$formattedNodeName > |${ExplainUtils.generateFieldString("Left keys", leftKeys)} > |${ExplainUtils.generateFieldString("Right keys", rightKeys)} > |${ExplainUtils.generateFieldString("Join condition", joinCondStr)} > |""".stripMargin > } else { > s""" > |$formattedNodeName > |${ExplainUtils.generateFieldString("Join condition", joinCondStr)} > |""".stripMargin > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39776) Join‘ verbose string didn't contains JoinType
[ https://issues.apache.org/jira/browse/SPARK-39776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-39776: -- Description: Current verbose string don't have joinType {code:java} override def verboseStringWithOperatorId(): String = { val joinCondStr = if (condition.isDefined) { s"${condition.get}" } else "None" if (leftKeys.nonEmpty || rightKeys.nonEmpty) { s""" |$formattedNodeName |${ExplainUtils.generateFieldString("Left keys", leftKeys)} |${ExplainUtils.generateFieldString("Right keys", rightKeys)} |${ExplainUtils.generateFieldString("Join condition", joinCondStr)} |""".stripMargin } else { s""" |$formattedNodeName |${ExplainUtils.generateFieldString("Join condition", joinCondStr)} |""".stripMargin } } {code} was: · {code:java} override def verboseStringWithOperatorId(): String = { val joinCondStr = if (condition.isDefined) { s"${condition.get}" } else "None" if (leftKeys.nonEmpty || rightKeys.nonEmpty) { s""" |$formattedNodeName |${ExplainUtils.generateFieldString("Left keys", leftKeys)} |${ExplainUtils.generateFieldString("Right keys", rightKeys)} |${ExplainUtils.generateFieldString("Join condition", joinCondStr)} |""".stripMargin } else { s""" |$formattedNodeName |${ExplainUtils.generateFieldString("Join condition", joinCondStr)} |""".stripMargin } } {code} > Join‘ verbose string didn't contains JoinType > - > > Key: SPARK-39776 > URL: https://issues.apache.org/jira/browse/SPARK-39776 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: angerszhu >Priority: Major > > Current verbose string don't have joinType > {code:java} > override def verboseStringWithOperatorId(): String = { > val joinCondStr = if (condition.isDefined) { > s"${condition.get}" > } else "None" > if (leftKeys.nonEmpty || rightKeys.nonEmpty) { > s""" > |$formattedNodeName > |${ExplainUtils.generateFieldString("Left keys", leftKeys)} > |${ExplainUtils.generateFieldString("Right keys", rightKeys)} > |${ExplainUtils.generateFieldString("Join condition", joinCondStr)} > |""".stripMargin > } else { > s""" > |$formattedNodeName > |${ExplainUtils.generateFieldString("Join condition", joinCondStr)} > |""".stripMargin > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39776) Join‘ verbose string didn't contains JoinType
angerszhu created SPARK-39776: - Summary: Join‘ verbose string didn't contains JoinType Key: SPARK-39776 URL: https://issues.apache.org/jira/browse/SPARK-39776 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Reporter: angerszhu · {code:java} override def verboseStringWithOperatorId(): String = { val joinCondStr = if (condition.isDefined) { s"${condition.get}" } else "None" if (leftKeys.nonEmpty || rightKeys.nonEmpty) { s""" |$formattedNodeName |${ExplainUtils.generateFieldString("Left keys", leftKeys)} |${ExplainUtils.generateFieldString("Right keys", rightKeys)} |${ExplainUtils.generateFieldString("Join condition", joinCondStr)} |""".stripMargin } else { s""" |$formattedNodeName |${ExplainUtils.generateFieldString("Join condition", joinCondStr)} |""".stripMargin } } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39527) V2Catalog rename not support newIdent with catalog
angerszhu created SPARK-39527: - Summary: V2Catalog rename not support newIdent with catalog Key: SPARK-39527 URL: https://issues.apache.org/jira/browse/SPARK-39527 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Reporter: angerszhu {code:java} test("rename a table") { sql("ALTER TABLE h2.test.empty_table RENAME TO h2.test.empty_table2") checkAnswer( sql("SHOW TABLES IN h2.test"), Seq(Row("test", "empty_table2"))) } {code} {code:java} [info] - rename a table *** FAILED *** (2 seconds, 358 milliseconds) [info] org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException: Failed table renaming from test.empty_table to h2.test.empty_table2 [info] at org.apache.spark.sql.jdbc.H2Dialect$.classifyException(H2Dialect.scala:117) [info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.classifyException(JdbcUtils.scala:1176) [info] at org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog.$anonfun$renameTable$1(JDBCTableCatalog.scala:102) [info] at org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog.$anonfun$renameTable$1$adapted(JDBCTableCatalog.scala:100) [info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.withConnection(JdbcUtils.scala:1184) [info] at org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog.renameTable(JDBCTableCatalog.scala:100) [info] at org.apache.spark.sql.execution.datasources.v2.RenameTableExec.run(RenameTableExec.scala:51) [info] at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43) [info] at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43) [info] at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49) [info] at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98) [info] at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:111) [info] at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:171) [info] at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95) [info] at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) [info] at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) [info] at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98) [info] at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94) [info] at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584) [info] at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176) [info] at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584) [info] at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) [info] at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) [info] at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) [info] at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) [info] at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) [info] at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560) [info] at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94) [info] at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81) [info] at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79) [info] at org.apache.spark.sql.Dataset.(Dataset.scala:220) [info] at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100) {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39400) spark-sql remain hive resource download dir after exit
angerszhu created SPARK-39400: - Summary: spark-sql remain hive resource download dir after exit Key: SPARK-39400 URL: https://issues.apache.org/jira/browse/SPARK-39400 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Reporter: angerszhu Fix For: 3.4.0 {code:java} drwxrwxr-x 2 yi.zhu yi.zhu4096 Jun 7 18:06 da92eec4-2db1-4941-9e53-b28c38e25e31_resources drwxrwxr-x 2 yi.zhu yi.zhu4096 Jun 7 18:14 dad364e8-ed1d-4ced-a6df-4897361c69b1_resources drwxrwxr-x 2 yi.zhu yi.zhu4096 Jun 7 18:13 ee0a2ee7-ff3e-4346-9181-e8e491b1ca15_resources drwxr-xr-x 2 yi.zhu yi.zhu4096 Jun 7 18:16 hsperfdata_yi.zhu {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39351) ShowCreateTable should redact properties
angerszhu created SPARK-39351: - Summary: ShowCreateTable should redact properties Key: SPARK-39351 URL: https://issues.apache.org/jira/browse/SPARK-39351 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: angerszhu ShowCreateTable should redact properties -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39350) DescribeNamespace should redact properties
angerszhu created SPARK-39350: - Summary: DescribeNamespace should redact properties Key: SPARK-39350 URL: https://issues.apache.org/jira/browse/SPARK-39350 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: angerszhu DescribeNamespace should redact properties -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37609) Transient StackOverflowError on DataFrame from Catalyst QueryPlan
[ https://issues.apache.org/jira/browse/SPARK-37609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544321#comment-17544321 ] angerszhu commented on SPARK-37609: --- Increase -Xss can resolve this. But we should better to refactor the current code... > Transient StackOverflowError on DataFrame from Catalyst QueryPlan > - > > Key: SPARK-37609 > URL: https://issues.apache.org/jira/browse/SPARK-37609 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.1.2 > Environment: py:3.9 >Reporter: Rafal Wojdyla >Priority: Major > > I sporadically observe a StackOverflowError from Catalyst's QueryPlan (for a > relatively complicated query), below is a stacktrace from the {{count}} on > that DF. It's a bit troubling because it's a transient error, with enough > retries (no change to code, probably some kind of cache?), I can get the op > to work :( > {noformat} > --- > Py4JJavaError Traceback (most recent call last) > ~/miniconda3/envs/tr-dev/lib/python3.9/site-packages/pyspark/sql/dataframe.py > in count(self) > 662 2 > 663 """ > --> 664 return int(self._jdf.count()) > 665 > 666 def collect(self): > ~/miniconda3/envs/tr-dev/lib/python3.9/site-packages/py4j/java_gateway.py in > __call__(self, *args) >1302 >1303 answer = self.gateway_client.send_command(command) > -> 1304 return_value = get_return_value( >1305 answer, self.gateway_client, self.target_id, self.name) >1306 > ~/miniconda3/envs/tr-dev/lib/python3.9/site-packages/pyspark/sql/utils.py in > deco(*a, **kw) > 109 def deco(*a, **kw): > 110 try: > --> 111 return f(*a, **kw) > 112 except py4j.protocol.Py4JJavaError as e: > 113 converted = convert_exception(e.java_exception) > ~/miniconda3/envs/tr-dev/lib/python3.9/site-packages/py4j/protocol.py in > get_return_value(answer, gateway_client, target_id, name) > 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client) > 325 if answer[1] == REFERENCE_TYPE: > --> 326 raise Py4JJavaError( > 327 "An error occurred while calling {0}{1}{2}.\n". > 328 format(target_id, ".", name), value) > Py4JJavaError: An error occurred while calling o9123.count. > : java.lang.StackOverflowError > at > org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:188) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) > ... > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37609) Transient StackOverflowError on DataFrame from Catalyst QueryPlan
[ https://issues.apache.org/jira/browse/SPARK-37609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544281#comment-17544281 ] angerszhu commented on SPARK-37609: --- [~yumwang]Seems just an very complex table schema. Not reproduce every time. I am tell user to try increase -Xss, to see if this way can resolve this probelm. > Transient StackOverflowError on DataFrame from Catalyst QueryPlan > - > > Key: SPARK-37609 > URL: https://issues.apache.org/jira/browse/SPARK-37609 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.1.2 > Environment: py:3.9 >Reporter: Rafal Wojdyla >Priority: Major > > I sporadically observe a StackOverflowError from Catalyst's QueryPlan (for a > relatively complicated query), below is a stacktrace from the {{count}} on > that DF. It's a bit troubling because it's a transient error, with enough > retries (no change to code, probably some kind of cache?), I can get the op > to work :( > {noformat} > --- > Py4JJavaError Traceback (most recent call last) > ~/miniconda3/envs/tr-dev/lib/python3.9/site-packages/pyspark/sql/dataframe.py > in count(self) > 662 2 > 663 """ > --> 664 return int(self._jdf.count()) > 665 > 666 def collect(self): > ~/miniconda3/envs/tr-dev/lib/python3.9/site-packages/py4j/java_gateway.py in > __call__(self, *args) >1302 >1303 answer = self.gateway_client.send_command(command) > -> 1304 return_value = get_return_value( >1305 answer, self.gateway_client, self.target_id, self.name) >1306 > ~/miniconda3/envs/tr-dev/lib/python3.9/site-packages/pyspark/sql/utils.py in > deco(*a, **kw) > 109 def deco(*a, **kw): > 110 try: > --> 111 return f(*a, **kw) > 112 except py4j.protocol.Py4JJavaError as e: > 113 converted = convert_exception(e.java_exception) > ~/miniconda3/envs/tr-dev/lib/python3.9/site-packages/py4j/protocol.py in > get_return_value(answer, gateway_client, target_id, name) > 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client) > 325 if answer[1] == REFERENCE_TYPE: > --> 326 raise Py4JJavaError( > 327 "An error occurred while calling {0}{1}{2}.\n". > 328 format(target_id, ".", name), value) > Py4JJavaError: An error occurred while calling o9123.count. > : java.lang.StackOverflowError > at > org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:188) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) > ... > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-37609) Transient StackOverflowError on DataFrame from Catalyst QueryPlan
[ https://issues.apache.org/jira/browse/SPARK-37609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544192#comment-17544192 ] angerszhu edited comment on SPARK-37609 at 5/31/22 7:43 AM: Same error in spark-3.1, query is simple, but so many nested columns, sometimes run into stackoverflow. {code:java} 22/05/26 15:26:48 ERROR ApplicationMaster: User class threw exception: java.lang.StackOverflowError java.lang.StackOverflowError at scala.collection.TraversableOnce.nonEmpty(TraversableOnce.scala:114) at scala.collection.TraversableOnce.nonEmpty$(TraversableOnce.scala:114) at scala.collection.AbstractTraversable.nonEmpty(Traversable.scala:108) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:358) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPla
[jira] [Comment Edited] (SPARK-37609) Transient StackOverflowError on DataFrame from Catalyst QueryPlan
[ https://issues.apache.org/jira/browse/SPARK-37609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544192#comment-17544192 ] angerszhu edited comment on SPARK-37609 at 5/31/22 7:42 AM: Same error in spark-3.1, query is simple, but so many nested columns. {code:java} 22/05/26 15:26:48 ERROR ApplicationMaster: User class threw exception: java.lang.StackOverflowError java.lang.StackOverflowError at scala.collection.TraversableOnce.nonEmpty(TraversableOnce.scala:114) at scala.collection.TraversableOnce.nonEmpty$(TraversableOnce.scala:114) at scala.collection.AbstractTraversable.nonEmpty(Traversable.scala:108) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:358) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) at org.apache
[jira] [Commented] (SPARK-37609) Transient StackOverflowError on DataFrame from Catalyst QueryPlan
[ https://issues.apache.org/jira/browse/SPARK-37609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544192#comment-17544192 ] angerszhu commented on SPARK-37609: --- Same error in spark-3.1 {code:java} 22/05/26 15:26:48 ERROR ApplicationMaster: User class threw exception: java.lang.StackOverflowError java.lang.StackOverflowError at scala.collection.TraversableOnce.nonEmpty(TraversableOnce.scala:114) at scala.collection.TraversableOnce.nonEmpty$(TraversableOnce.scala:114) at scala.collection.AbstractTraversable.nonEmpty(Traversable.scala:108) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:358) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359) at org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
[jira] [Created] (SPARK-39343) DescribeTableExec should redact properties
angerszhu created SPARK-39343: - Summary: DescribeTableExec should redact properties Key: SPARK-39343 URL: https://issues.apache.org/jira/browse/SPARK-39343 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: angerszhu DescribeTableExec should redact properties -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39342) ShowTablePropertiesCommand/ShowTablePropertiesExec should redact properties.
[ https://issues.apache.org/jira/browse/SPARK-39342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-39342: -- Summary: ShowTablePropertiesCommand/ShowTablePropertiesExec should redact properties. (was: ShowTablePropertiesCommand should redact properties.) > ShowTablePropertiesCommand/ShowTablePropertiesExec should redact properties. > > > Key: SPARK-39342 > URL: https://issues.apache.org/jira/browse/SPARK-39342 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: angerszhu >Priority: Major > > ShowTablePropertiesCommand should redact properties. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39342) ShowTablePropertiesCommand should redact properties.
angerszhu created SPARK-39342: - Summary: ShowTablePropertiesCommand should redact properties. Key: SPARK-39342 URL: https://issues.apache.org/jira/browse/SPARK-39342 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: angerszhu ShowTablePropertiesCommand should redact properties. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39337) Refactor DescribeTableExec
angerszhu created SPARK-39337: - Summary: Refactor DescribeTableExec Key: SPARK-39337 URL: https://issues.apache.org/jira/browse/SPARK-39337 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.3.0 Reporter: angerszhu Repeated code, refactor the code. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39337) Refactor DescribeTableExec
[ https://issues.apache.org/jira/browse/SPARK-39337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-39337: -- Description: Repeated code, refactor the code. {code:java} private def addTableDetails(rows: ArrayBuffer[InternalRow]): Unit = { rows += emptyRow() rows += toCatalystRow("# Detailed Table Information", "", "") rows += toCatalystRow("Name", table.name(), "") CatalogV2Util.TABLE_RESERVED_PROPERTIES.foreach(propKey => { if (table.properties.containsKey(propKey)) { rows += toCatalystRow(propKey.capitalize, table.properties.get(propKey), "") } }) val properties = table.properties.asScala.toList .filter(kv => !CatalogV2Util.TABLE_RESERVED_PROPERTIES.contains(kv._1)) .sortBy(_._1).map { case (key, value) => key + "=" + value }.mkString("[", ",", "]") rows += toCatalystRow("Table Properties", properties, "") } {code} was:Repeated code, refactor the code. > Refactor DescribeTableExec > -- > > Key: SPARK-39337 > URL: https://issues.apache.org/jira/browse/SPARK-39337 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.3.0 >Reporter: angerszhu >Priority: Major > > Repeated code, refactor the code. > {code:java} > private def addTableDetails(rows: ArrayBuffer[InternalRow]): Unit = { > rows += emptyRow() > rows += toCatalystRow("# Detailed Table Information", "", "") > rows += toCatalystRow("Name", table.name(), "") > CatalogV2Util.TABLE_RESERVED_PROPERTIES.foreach(propKey => { > if (table.properties.containsKey(propKey)) { > rows += toCatalystRow(propKey.capitalize, > table.properties.get(propKey), "") > } > }) > val properties = > table.properties.asScala.toList > .filter(kv => > !CatalogV2Util.TABLE_RESERVED_PROPERTIES.contains(kv._1)) > .sortBy(_._1).map { > case (key, value) => key + "=" + value > }.mkString("[", ",", "]") > rows += toCatalystRow("Table Properties", properties, "") > } > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39335) DescribeTableCommand should redact properties
[ https://issues.apache.org/jira/browse/SPARK-39335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-39335: -- Summary: DescribeTableCommand should redact properties (was: Redact table should redact properties) > DescribeTableCommand should redact properties > - > > Key: SPARK-39335 > URL: https://issues.apache.org/jira/browse/SPARK-39335 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: angerszhu >Priority: Major > > Now we only redact storage properties when desc table, for normal properties > should redact too. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39336) Redact table/partition properties
angerszhu created SPARK-39336: - Summary: Redact table/partition properties Key: SPARK-39336 URL: https://issues.apache.org/jira/browse/SPARK-39336 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.3.0 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39335) Redact table should redact properties
[ https://issues.apache.org/jira/browse/SPARK-39335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-39335: -- Parent: SPARK-39336 Issue Type: Sub-task (was: Task) > Redact table should redact properties > - > > Key: SPARK-39335 > URL: https://issues.apache.org/jira/browse/SPARK-39335 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: angerszhu >Priority: Major > > Now we only redact storage properties when desc table, for normal properties > should redact too. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39335) Redact table should redact properties
angerszhu created SPARK-39335: - Summary: Redact table should redact properties Key: SPARK-39335 URL: https://issues.apache.org/jira/browse/SPARK-39335 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.3.0 Reporter: angerszhu Now we only redact storage properties when desc table, for normal properties should redact too. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27442) ParquetFileFormat fails to read column named with invalid characters
[ https://issues.apache.org/jira/browse/SPARK-27442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-27442: -- Parent: SPARK-36200 Issue Type: Sub-task (was: Bug) > ParquetFileFormat fails to read column named with invalid characters > > > Key: SPARK-27442 > URL: https://issues.apache.org/jira/browse/SPARK-27442 > Project: Spark > Issue Type: Sub-task > Components: Input/Output >Affects Versions: 2.0.0, 2.4.1 >Reporter: Jan Vršovský >Assignee: angerszhu >Priority: Minor > Fix For: 3.3.0 > > > When reading a parquet file which contains characters considered invalid, the > reader fails with exception: > Name: org.apache.spark.sql.AnalysisException > Message: Attribute name "..." contains invalid character(s) among " > ,;{}()\n\t=". Please use alias to rename it. > Spark should not be able to write such files, but it should be able to read > it (and allow the user to correct it). However, possible workarounds (such as > using alias to rename the column, or forcing another schema) do not work, > since the check is done on the input. > (Possible fix: remove superficial > {{ParquetWriteSupport.setSchema(requiredSchema, hadoopConf)}} from > {{buildReaderWithPartitionValues}} ?) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39285) Spark should not check filed name when read data
angerszhu created SPARK-39285: - Summary: Spark should not check filed name when read data Key: SPARK-39285 URL: https://issues.apache.org/jira/browse/SPARK-39285 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: angerszhu Spark should not check read data when read data -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39224) Ignore noisy. warning message in ProcfsMetricsGetter
[ https://issues.apache.org/jira/browse/SPARK-39224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-39224: -- Summary: Ignore noisy. warning message in ProcfsMetricsGetter (was: Ignore noisy. warning message in ProcessMetricsGetter) > Ignore noisy. warning message in ProcfsMetricsGetter > > > Key: SPARK-39224 > URL: https://issues.apache.org/jira/browse/SPARK-39224 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.3.0 >Reporter: angerszhu >Priority: Major > > There are many user complain about the noisy warning messages in > ProcessMetricsGetter > {code:java} > 22/05/18 16:48:58 WARN ProcfsMetricsGetter: There was a problem with reading > the stat file of the process. > java.io.FileNotFoundException: /proc/50371/stat (No such file or directory) > at java.io.FileInputStream.open0(Native Method) > at java.io.FileInputStream.open(FileInputStream.java:195) > at java.io.FileInputStream.(FileInputStream.java:138) > at > org.apache.spark.executor.ProcfsMetricsGetter.openReader$1(ProcfsMetricsGetter.scala:174) > at > org.apache.spark.executor.ProcfsMetricsGetter.$anonfun$addProcfsMetricsFromOneProcess$1(ProcfsMetricsGetter.scala:176) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2647) > at > org.apache.spark.executor.ProcfsMetricsGetter.addProcfsMetricsFromOneProcess(ProcfsMetricsGetter.scala:176) > at > org.apache.spark.executor.ProcfsMetricsGetter.$anonfun$computeAllMetrics$1(ProcfsMetricsGetter.scala:216) > at > scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23) > at scala.collection.immutable.Set$Set2.foreach(Set.scala:132) > at > org.apache.spark.executor.ProcfsMetricsGetter.computeAllMetrics(ProcfsMetricsGetter.scala:214) > at > org.apache.spark.metrics.ProcessTreeMetrics$.getMetricValues(ExecutorMetricType.scala:93) > at > org.apache.spark.executor.ExecutorMetrics$.$anonfun$getCurrentMetrics$1(ExecutorMetrics.scala:103) > at > org.apache.spark.executor.ExecutorMetrics$.$anonfun$getCurrentMetrics$1$adapted(ExecutorMetrics.scala:102) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at > org.apache.spark.executor.ExecutorMetrics$.getCurrentMetrics(ExecutorMetrics.scala:102) > at > org.apache.spark.SparkContext.reportHeartBeat(SparkContext.scala:2578) > at org.apache.spark.SparkContext.$anonfun$new$30(SparkContext.scala:587) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2022) > at org.apache.spark.Heartbeater$$anon$1.run(Heartbeater.scala:46) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39224) Ignore noisy. warning message in ProcessMetricsGetter
angerszhu created SPARK-39224: - Summary: Ignore noisy. warning message in ProcessMetricsGetter Key: SPARK-39224 URL: https://issues.apache.org/jira/browse/SPARK-39224 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.3.0 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39224) Ignore noisy. warning message in ProcessMetricsGetter
[ https://issues.apache.org/jira/browse/SPARK-39224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-39224: -- Description: There are many user complain about the noisy warning messages in ProcessMetricsGetter {code:java} 22/05/18 16:48:58 WARN ProcfsMetricsGetter: There was a problem with reading the stat file of the process. java.io.FileNotFoundException: /proc/50371/stat (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.(FileInputStream.java:138) at org.apache.spark.executor.ProcfsMetricsGetter.openReader$1(ProcfsMetricsGetter.scala:174) at org.apache.spark.executor.ProcfsMetricsGetter.$anonfun$addProcfsMetricsFromOneProcess$1(ProcfsMetricsGetter.scala:176) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2647) at org.apache.spark.executor.ProcfsMetricsGetter.addProcfsMetricsFromOneProcess(ProcfsMetricsGetter.scala:176) at org.apache.spark.executor.ProcfsMetricsGetter.$anonfun$computeAllMetrics$1(ProcfsMetricsGetter.scala:216) at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23) at scala.collection.immutable.Set$Set2.foreach(Set.scala:132) at org.apache.spark.executor.ProcfsMetricsGetter.computeAllMetrics(ProcfsMetricsGetter.scala:214) at org.apache.spark.metrics.ProcessTreeMetrics$.getMetricValues(ExecutorMetricType.scala:93) at org.apache.spark.executor.ExecutorMetrics$.$anonfun$getCurrentMetrics$1(ExecutorMetrics.scala:103) at org.apache.spark.executor.ExecutorMetrics$.$anonfun$getCurrentMetrics$1$adapted(ExecutorMetrics.scala:102) at scala.collection.Iterator.foreach(Iterator.scala:941) at scala.collection.Iterator.foreach$(Iterator.scala:941) at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at org.apache.spark.executor.ExecutorMetrics$.getCurrentMetrics(ExecutorMetrics.scala:102) at org.apache.spark.SparkContext.reportHeartBeat(SparkContext.scala:2578) at org.apache.spark.SparkContext.$anonfun$new$30(SparkContext.scala:587) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2022) at org.apache.spark.Heartbeater$$anon$1.run(Heartbeater.scala:46) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) {code} > Ignore noisy. warning message in ProcessMetricsGetter > - > > Key: SPARK-39224 > URL: https://issues.apache.org/jira/browse/SPARK-39224 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.3.0 >Reporter: angerszhu >Priority: Major > > There are many user complain about the noisy warning messages in > ProcessMetricsGetter > {code:java} > 22/05/18 16:48:58 WARN ProcfsMetricsGetter: There was a problem with reading > the stat file of the process. > java.io.FileNotFoundException: /proc/50371/stat (No such file or directory) > at java.io.FileInputStream.open0(Native Method) > at java.io.FileInputStream.open(FileInputStream.java:195) > at java.io.FileInputStream.(FileInputStream.java:138) > at > org.apache.spark.executor.ProcfsMetricsGetter.openReader$1(ProcfsMetricsGetter.scala:174) > at > org.apache.spark.executor.ProcfsMetricsGetter.$anonfun$addProcfsMetricsFromOneProcess$1(ProcfsMetricsGetter.scala:176) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2647) > at > org.apache.spark.executor.ProcfsMetricsGetter.addProcfsMetricsFromOneProcess(ProcfsMetricsGetter.scala:176) > at > org.apache.spark.executor.ProcfsMetricsGetter.$anonfun$computeAllMetrics$1(ProcfsMetricsGetter.scala:216) > at > scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23) > at scala.collection.immutable.Set$Set2.foreach(Set.scala:132) > at > org.apache.spark.executor.ProcfsMetricsGetter.computeAllMetrics(ProcfsMetricsGetter.scala:214) > at > org.apache.spark.metrics.ProcessTreeMetrics$.getMetricValues(ExecutorMetricType.scala:93) > at > org.apache.spark.executor.ExecutorMetrics$.$anonfun$getCurrentMetrics$1(ExecutorMetrics.scala:103) > at > org.apache.spark.executor.ExecutorMetrics$.$anonfun$getCurrentMetrics$1$adapted(ExecutorMetrics.scala:102) > at scala.collection.Iterator.foreach(Iterator.s
[jira] [Created] (SPARK-39195) Spark should use two step update of outputCommitCoordinator
angerszhu created SPARK-39195: - Summary: Spark should use two step update of outputCommitCoordinator Key: SPARK-39195 URL: https://issues.apache.org/jira/browse/SPARK-39195 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.3.0 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39178) When throw SparkFatalException, should show root cause too.
angerszhu created SPARK-39178: - Summary: When throw SparkFatalException, should show root cause too. Key: SPARK-39178 URL: https://issues.apache.org/jira/browse/SPARK-39178 Project: Spark Issue Type: Task Components: Spark Core Affects Versions: 3.2.1, 3.3.0 Reporter: angerszhu Fix For: 3.4.0 We have a query throw SparkFatalException without root cause. {code:java} at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:163) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) at org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:525) at org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:453) at org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:452) at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:496) at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:50) at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:746) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) at org.apache.spark.sql.execution.ProjectExec.doExecute(basicPhysicalOperators.scala:92) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) at org.apache.spark.sql.execution.SortExec.doExecute(SortExec.scala:112) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:222) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106) at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:120) at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685) at org.apache.spark.sql.Dataset.(Dataset.scala:228) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96) at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
[jira] [Created] (SPARK-39136) JDBCTable support properties
angerszhu created SPARK-39136: - Summary: JDBCTable support properties Key: SPARK-39136 URL: https://issues.apache.org/jira/browse/SPARK-39136 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.3.0 Reporter: angerszhu Fix For: 3.4.0 {code:java} > > desc formatted jdbc.test.people; NAMEstring ID int # Partitioning Not partitioned # Detailed Table Information Nametest.people Table Properties[] Time taken: 0.048 seconds, Fetched 9 row(s) {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39136) JDBCTable support properties
[ https://issues.apache.org/jira/browse/SPARK-39136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534220#comment-17534220 ] angerszhu commented on SPARK-39136: --- Raise a ticket soon > JDBCTable support properties > > > Key: SPARK-39136 > URL: https://issues.apache.org/jira/browse/SPARK-39136 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.3.0 >Reporter: angerszhu >Priority: Major > Fix For: 3.4.0 > > > {code:java} > > > > desc formatted jdbc.test.people; > NAME string > IDint > # Partitioning > Not partitioned > # Detailed Table Information > Name test.people > Table Properties [] > Time taken: 0.048 seconds, Fetched 9 row(s) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39110) Add metrics properties to Environment page
angerszhu created SPARK-39110: - Summary: Add metrics properties to Environment page Key: SPARK-39110 URL: https://issues.apache.org/jira/browse/SPARK-39110 Project: Spark Issue Type: Task Components: Web UI Affects Versions: 3.3.0 Reporter: angerszhu We have different ways to load metrics configuration, user may not sure which one is real work, we can add this to environment tab. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39043) Hive client should not gather statistic by default.
angerszhu created SPARK-39043: - Summary: Hive client should not gather statistic by default. Key: SPARK-39043 URL: https://issues.apache.org/jira/browse/SPARK-39043 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.0, 3.1.2, 3.3.0 Reporter: angerszhu When use `InsertIntoHiveTable`, when insert overwrite partition, it will call Hive.loadPartition(), in this method, when `hive.stats.autogather` is true(default is true) {code:java} // Some comments here public String getFoo() if (oldPart == null) { newTPart.getTPartition().setParameters(new HashMap()); if (this.getConf().getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER)) { StatsSetupConst.setBasicStatsStateForCreateTable(newTPart.getParameters(), StatsSetupConst.TRUE); } public static void setBasicStatsStateForCreateTable(Map params, String setting) { if (TRUE.equals(setting)) { for (String stat : StatsSetupConst.supportedStats) { params.put(stat, "0"); } } setBasicStatsState(params, setting); } public static final String[] supportedStats = {NUM_FILES,ROW_COUNT,TOTAL_SIZE,RAW_DATA_SIZE}; {code} Then it set default rowNum as 0, but since spark will update numFiles and rawSize, so rowNum remain 0. This impact other system like presto's CBO. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38910) Clean sparkStaging dir should before unregister()
[ https://issues.apache.org/jira/browse/SPARK-38910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-38910: -- Description: {code:java} ShutdownHookManager.addShutdownHook(priority) { () => try { val maxAppAttempts = client.getMaxRegAttempts(sparkConf, yarnConf) val isLastAttempt = appAttemptId.getAttemptId() >= maxAppAttempts if (!finished) { // The default state of ApplicationMaster is failed if it is invoked by shut down hook. // This behavior is different compared to 1.x version. // If user application is exited ahead of time by calling System.exit(N), here mark // this application as failed with EXIT_EARLY. For a good shutdown, user shouldn't call // System.exit(0) to terminate the application. finish(finalStatus, ApplicationMaster.EXIT_EARLY, "Shutdown hook called before final status was reported.") } if (!unregistered) { // we only want to unregister if we don't want the RM to retry if (finalStatus == FinalApplicationStatus.SUCCEEDED || isLastAttempt) { unregister(finalStatus, finalMsg) cleanupStagingDir(new Path(System.getenv("SPARK_YARN_STAGING_DIR"))) } } } catch { case e: Throwable => logWarning("Ignoring Exception while stopping ApplicationMaster from shutdown hook", e) } }{code} unregister may throw exception, clean staging dir should before unregister. was: {code:java} {code} Not clean the staging dir when match case {code:jave} !launcherBackend.isConnected() && fireAndForget {code} > Clean sparkStaging dir should before unregister() > - > > Key: SPARK-38910 > URL: https://issues.apache.org/jira/browse/SPARK-38910 > Project: Spark > Issue Type: Task > Components: YARN >Affects Versions: 3.2.1, 3.3.0 >Reporter: angerszhu >Priority: Major > > {code:java} > ShutdownHookManager.addShutdownHook(priority) { () => > try { > val maxAppAttempts = client.getMaxRegAttempts(sparkConf, yarnConf) > val isLastAttempt = appAttemptId.getAttemptId() >= maxAppAttempts > if (!finished) { > // The default state of ApplicationMaster is failed if it is > invoked by shut down hook. > // This behavior is different compared to 1.x version. > // If user application is exited ahead of time by calling > System.exit(N), here mark > // this application as failed with EXIT_EARLY. For a good > shutdown, user shouldn't call > // System.exit(0) to terminate the application. > finish(finalStatus, > ApplicationMaster.EXIT_EARLY, > "Shutdown hook called before final status was reported.") > } > if (!unregistered) { > // we only want to unregister if we don't want the RM to retry > if (finalStatus == FinalApplicationStatus.SUCCEEDED || > isLastAttempt) { > unregister(finalStatus, finalMsg) > cleanupStagingDir(new > Path(System.getenv("SPARK_YARN_STAGING_DIR"))) > } > } > } catch { > case e: Throwable => > logWarning("Ignoring Exception while stopping ApplicationMaster > from shutdown hook", e) > } > }{code} > unregister may throw exception, clean staging dir should before unregister. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38910) Clean sparkStaging dir should before unregister()
[ https://issues.apache.org/jira/browse/SPARK-38910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-38910: -- Description: {code:java} {code} Not clean the staging dir when match case {code:jave} !launcherBackend.isConnected() && fireAndForget {code} was: {code:java} def run(): Unit = { submitApplication() if (!launcherBackend.isConnected() && fireAndForget) { val report = getApplicationReport(appId) val state = report.getYarnApplicationState logInfo(s"Application report for $appId (state: $state)") logInfo(formatReportDetails(report, getDriverLogsLink(report))) if (state == YarnApplicationState.FAILED || state == YarnApplicationState.KILLED) { throw new SparkException(s"Application $appId finished with status: $state") } } else { val YarnAppReport(appState, finalState, diags) = monitorApplication(appId) if (appState == YarnApplicationState.FAILED || finalState == FinalApplicationStatus.FAILED) { var amContainerSucceed = false val amContainerExitMsg = s"AM Container for " + s"${yarnClient.getApplicationReport(appId).getCurrentApplicationAttemptId} " + s"exited with exitCode: 0" diags.foreach { err => logError(s"Application diagnostics message: $err") if (err.contains(amContainerExitMsg)) { amContainerSucceed = true {code} Not clean the staging dir when match case {code:jave} !launcherBackend.isConnected() && fireAndForget {code} > Clean sparkStaging dir should before unregister() > - > > Key: SPARK-38910 > URL: https://issues.apache.org/jira/browse/SPARK-38910 > Project: Spark > Issue Type: Task > Components: YARN >Affects Versions: 3.2.1, 3.3.0 >Reporter: angerszhu >Priority: Major > > {code:java} > {code} > Not clean the staging dir when match case > {code:jave} > !launcherBackend.isConnected() && fireAndForget > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38910) Clean sparkStaging dir should before unregister()
[ https://issues.apache.org/jira/browse/SPARK-38910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-38910: -- Summary: Clean sparkStaging dir should before unregister() (was: Clean sparkStaging dir should bef) > Clean sparkStaging dir should before unregister() > - > > Key: SPARK-38910 > URL: https://issues.apache.org/jira/browse/SPARK-38910 > Project: Spark > Issue Type: Task > Components: YARN >Affects Versions: 3.2.1, 3.3.0 >Reporter: angerszhu >Priority: Major > > {code:java} > def run(): Unit = { > submitApplication() > if (!launcherBackend.isConnected() && fireAndForget) { > val report = getApplicationReport(appId) > val state = report.getYarnApplicationState > logInfo(s"Application report for $appId (state: $state)") > logInfo(formatReportDetails(report, getDriverLogsLink(report))) > if (state == YarnApplicationState.FAILED || state == > YarnApplicationState.KILLED) { > throw new SparkException(s"Application $appId finished with status: > $state") > } > } else { > val YarnAppReport(appState, finalState, diags) = > monitorApplication(appId) > if (appState == YarnApplicationState.FAILED || finalState == > FinalApplicationStatus.FAILED) { > var amContainerSucceed = false > val amContainerExitMsg = s"AM Container for " + > > s"${yarnClient.getApplicationReport(appId).getCurrentApplicationAttemptId} " + > s"exited with exitCode: 0" > diags.foreach { err => > logError(s"Application diagnostics message: $err") > if (err.contains(amContainerExitMsg)) { > amContainerSucceed = true > > {code} > Not clean the staging dir when match case > {code:jave} > !launcherBackend.isConnected() && fireAndForget > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38910) Clean sparkStaging dir should bef
[ https://issues.apache.org/jira/browse/SPARK-38910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-38910: -- Summary: Clean sparkStaging dir should bef (was: Clean sparkStaging dir when WAIT_FOR_APP_COMPLETION is false too) > Clean sparkStaging dir should bef > - > > Key: SPARK-38910 > URL: https://issues.apache.org/jira/browse/SPARK-38910 > Project: Spark > Issue Type: Task > Components: YARN >Affects Versions: 3.2.1, 3.3.0 >Reporter: angerszhu >Priority: Major > > {code:java} > def run(): Unit = { > submitApplication() > if (!launcherBackend.isConnected() && fireAndForget) { > val report = getApplicationReport(appId) > val state = report.getYarnApplicationState > logInfo(s"Application report for $appId (state: $state)") > logInfo(formatReportDetails(report, getDriverLogsLink(report))) > if (state == YarnApplicationState.FAILED || state == > YarnApplicationState.KILLED) { > throw new SparkException(s"Application $appId finished with status: > $state") > } > } else { > val YarnAppReport(appState, finalState, diags) = > monitorApplication(appId) > if (appState == YarnApplicationState.FAILED || finalState == > FinalApplicationStatus.FAILED) { > var amContainerSucceed = false > val amContainerExitMsg = s"AM Container for " + > > s"${yarnClient.getApplicationReport(appId).getCurrentApplicationAttemptId} " + > s"exited with exitCode: 0" > diags.foreach { err => > logError(s"Application diagnostics message: $err") > if (err.contains(amContainerExitMsg)) { > amContainerSucceed = true > > {code} > Not clean the staging dir when match case > {code:jave} > !launcherBackend.isConnected() && fireAndForget > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38910) Clean sparkStaging dir when WAIT_FOR_APP_COMPLETION is false too
angerszhu created SPARK-38910: - Summary: Clean sparkStaging dir when WAIT_FOR_APP_COMPLETION is false too Key: SPARK-38910 URL: https://issues.apache.org/jira/browse/SPARK-38910 Project: Spark Issue Type: Task Components: YARN Affects Versions: 3.2.1, 3.3.0 Reporter: angerszhu {code:java} def run(): Unit = { submitApplication() if (!launcherBackend.isConnected() && fireAndForget) { val report = getApplicationReport(appId) val state = report.getYarnApplicationState logInfo(s"Application report for $appId (state: $state)") logInfo(formatReportDetails(report, getDriverLogsLink(report))) if (state == YarnApplicationState.FAILED || state == YarnApplicationState.KILLED) { throw new SparkException(s"Application $appId finished with status: $state") } } else { val YarnAppReport(appState, finalState, diags) = monitorApplication(appId) if (appState == YarnApplicationState.FAILED || finalState == FinalApplicationStatus.FAILED) { var amContainerSucceed = false val amContainerExitMsg = s"AM Container for " + s"${yarnClient.getApplicationReport(appId).getCurrentApplicationAttemptId} " + s"exited with exitCode: 0" diags.foreach { err => logError(s"Application diagnostics message: $err") if (err.contains(amContainerExitMsg)) { amContainerSucceed = true {code} Not clean the staging dir when match case {code:jave} !launcherBackend.isConnected() && fireAndForget {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38498) Support add StreamingListener by conf
[ https://issues.apache.org/jira/browse/SPARK-38498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-38498: -- Description: Currently, if user want to add an customized StreamingListener to StreamingContext, we need to add this listener in customized code. ``` streamingContext.addStreamingListener() ``` we should support use conf to add this > Support add StreamingListener by conf > - > > Key: SPARK-38498 > URL: https://issues.apache.org/jira/browse/SPARK-38498 > Project: Spark > Issue Type: Task > Components: SQL, Structured Streaming >Affects Versions: 3.2.1 >Reporter: angerszhu >Priority: Major > > Currently, if user want to add an customized StreamingListener to > StreamingContext, we need to add this listener in customized code. > ``` > streamingContext.addStreamingListener() > ``` > we should support use conf to add this -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38498) Support add StreamingListener by conf
angerszhu created SPARK-38498: - Summary: Support add StreamingListener by conf Key: SPARK-38498 URL: https://issues.apache.org/jira/browse/SPARK-38498 Project: Spark Issue Type: Task Components: SQL, Structured Streaming Affects Versions: 3.2.1 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38459) IsolatedClientLoader use build in hadoop version
angerszhu created SPARK-38459: - Summary: IsolatedClientLoader use build in hadoop version Key: SPARK-38459 URL: https://issues.apache.org/jira/browse/SPARK-38459 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.1 Reporter: angerszhu According to https://github.com/apache/spark/pull/34855#discussion_r822266139 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38449) Not call createTable when ifNotExist=true and table eixsts
[ https://issues.apache.org/jira/browse/SPARK-38449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-38449: -- Description: In current code of v1, when table exist and ignoreTableExists = true, when table exists, will still call create table. It's not necessary was:In current code of v1, when table exist and ignoreTableExists = true, when table exists, will still call create table. > Not call createTable when ifNotExist=true and table eixsts > -- > > Key: SPARK-38449 > URL: https://issues.apache.org/jira/browse/SPARK-38449 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.1 >Reporter: angerszhu >Priority: Major > > In current code of v1, when table exist and ignoreTableExists = true, when > table exists, will still call create table. It's not necessary -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38449) Not call createTable when ifNotExist=true and table eixsts
angerszhu created SPARK-38449: - Summary: Not call createTable when ifNotExist=true and table eixsts Key: SPARK-38449 URL: https://issues.apache.org/jira/browse/SPARK-38449 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.1 Reporter: angerszhu In current code of v1, when table exist and ignoreTableExists = true, when table exists, will still call create table. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38382) Refactor migration guide's sentences
angerszhu created SPARK-38382: - Summary: Refactor migration guide's sentences Key: SPARK-38382 URL: https://issues.apache.org/jira/browse/SPARK-38382 Project: Spark Issue Type: Task Components: Documentation Affects Versions: 3.2.1 Reporter: angerszhu Current migration guide use Since spark x.x.x and In spark x.x.x, we should unify it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38358) Add migration guide for spark.sql.hive.convertMetastoreInsertDir and spark.sql.hive.convertMetastoreCtas
angerszhu created SPARK-38358: - Summary: Add migration guide for spark.sql.hive.convertMetastoreInsertDir and spark.sql.hive.convertMetastoreCtas Key: SPARK-38358 URL: https://issues.apache.org/jira/browse/SPARK-38358 Project: Spark Issue Type: Task Components: Documentation, SQL Affects Versions: 3.2.1, 3.1.2, 3.0.3 Reporter: angerszhu After we migration to spark3, many job throw exception since in data source API, we can't support overwrite into partition table while reading from this table. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38294) DDLUtils.verifyNotReadPath should check target is subDir
angerszhu created SPARK-38294: - Summary: DDLUtils.verifyNotReadPath should check target is subDir Key: SPARK-38294 URL: https://issues.apache.org/jira/browse/SPARK-38294 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.1 Reporter: angerszhu {code} [info] Cause: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 14.0 failed 1 times, most recent failure: Lost task 0.0 in stage 14.0 (TID 15) (10.12.190.176 executor driver): org.apache.spark.SparkException: Task failed while writing rows. [info] at org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:577) [info] at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:345) [info] at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$20(FileFormatWriter.scala:252) [info] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) [info] at org.apache.spark.scheduler.Task.run(Task.scala:136) [info] at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:507) [info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1475) [info] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:510) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [info] at java.lang.Thread.run(Thread.java:748) [info] Caused by: java.io.FileNotFoundException: [info] File file:/Users/yi.zhu/Documents/project/Angerszh/spark/target/tmp/spark-f1c6b035-e585-4c0e-9b83-17ad54e85978/dt=2020-09-10/part-0-855b7af4-fe2b-4933-807a-6bf40eab11ba.c000.snappy.parquet does not exist [info] [info] It is possible the underlying files have been updated. You can explicitly invalidate [info] the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by [info] recreating the Dataset/DataFrame involved. [info] [info] at org.apache.spark.sql.errors.QueryExecutionErrors$.readCurrentFileNotFoundError(QueryExecutionErrors.scala:583) [info] at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:212) [info] at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:270) [info] at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116) [info] at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:548) [info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source) [info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) [info] at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) [info] at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) [info] at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:91) [info] at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:328) [info] at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1509) [info] at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:335) [info] ... 9 more [info] [info] Driver stacktrace: {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38270) SQL CLI AM should keep same exitcode with client
[ https://issues.apache.org/jira/browse/SPARK-38270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-38270: -- Parent: SPARK-36623 Issue Type: Sub-task (was: Task) > SQL CLI AM should keep same exitcode with client > > > Key: SPARK-38270 > URL: https://issues.apache.org/jira/browse/SPARK-38270 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.1 >Reporter: angerszhu >Priority: Major > > Currently for SQL CLI, we all use shutdown hook to stop SC > {code:java} > // Clean up after we exit > ShutdownHookManager.addShutdownHook { () => SparkSQLEnv.stop() } > {code} > This cause Yarn AM always success even client exit with code not 0. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38293) Fix flaky text of HealthTrackerIntegrationSuite
[ https://issues.apache.org/jira/browse/SPARK-38293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496422#comment-17496422 ] angerszhu commented on SPARK-38293: --- cc [~dongjoon] [~hyukjin.kwon] Have meet this for some times. > Fix flaky text of HealthTrackerIntegrationSuite > > > Key: SPARK-38293 > URL: https://issues.apache.org/jira/browse/SPARK-38293 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: angerszhu >Priority: Major > > {code:java} > [info] HealthTrackerIntegrationSuite: > [info] - If preferred node is bad, without excludeOnFailure job will fail > (120 milliseconds) > [info] - With default settings, job can succeed despite multiple bad > executors on node (3 seconds, 78 milliseconds) > [info] - Bad node with multiple executors, job will still succeed with the > right confs *** FAILED *** (61 milliseconds) > [info] Map() did not equal Map(0 -> 42, 5 -> 42, 1 -> 42, 6 -> 42, 9 -> 42, > 2 -> 42, 7 -> 42, 3 -> 42, 8 -> 42, 4 -> 42) > (HealthTrackerIntegrationSuite.scala:94) > [info] Analysis: > [info] HashMap(0: -> 42, 1: -> 42, 2: -> 42, 3: -> 42, 4: -> 42, 5: -> 42, > 6: -> 42, 7: -> 42, 8: -> 42, 9: -> 42) > [info] org.scalatest.exceptions.TestFailedException: > [info] at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > [info] at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > [info] at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > [info] at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38293) Fix flaky text of HealthTrackerIntegrationSuite
angerszhu created SPARK-38293: - Summary: Fix flaky text of HealthTrackerIntegrationSuite Key: SPARK-38293 URL: https://issues.apache.org/jira/browse/SPARK-38293 Project: Spark Issue Type: Task Components: Spark Core Affects Versions: 3.2.1 Reporter: angerszhu {code:java} [info] HealthTrackerIntegrationSuite: [info] - If preferred node is bad, without excludeOnFailure job will fail (120 milliseconds) [info] - With default settings, job can succeed despite multiple bad executors on node (3 seconds, 78 milliseconds) [info] - Bad node with multiple executors, job will still succeed with the right confs *** FAILED *** (61 milliseconds) [info] Map() did not equal Map(0 -> 42, 5 -> 42, 1 -> 42, 6 -> 42, 9 -> 42, 2 -> 42, 7 -> 42, 3 -> 42, 8 -> 42, 4 -> 42) (HealthTrackerIntegrationSuite.scala:94) [info] Analysis: [info] HashMap(0: -> 42, 1: -> 42, 2: -> 42, 3: -> 42, 4: -> 42, 5: -> 42, 6: -> 42, 7: -> 42, 8: -> 42, 9: -> 42) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) [info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) [info] at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) [info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38289) Refactor SQL CLI exit code related code
angerszhu created SPARK-38289: - Summary: Refactor SQL CLI exit code related code Key: SPARK-38289 URL: https://issues.apache.org/jira/browse/SPARK-38289 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.1 Reporter: angerszhu Refactor SQL CLI exit code related code -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38270) SQL CLI AM should keep same exitcode with client
[ https://issues.apache.org/jira/browse/SPARK-38270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-38270: -- Description: Currently for SQL CLI, we all use shutdown hook to stop SC {code:java} // Clean up after we exit ShutdownHookManager.addShutdownHook { () => SparkSQLEnv.stop() } {code} This cause Yarn AM always success even client exit with code not 0. > SQL CLI AM should keep same exitcode with client > > > Key: SPARK-38270 > URL: https://issues.apache.org/jira/browse/SPARK-38270 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.1 >Reporter: angerszhu >Priority: Major > > Currently for SQL CLI, we all use shutdown hook to stop SC > {code:java} > // Clean up after we exit > ShutdownHookManager.addShutdownHook { () => SparkSQLEnv.stop() } > {code} > This cause Yarn AM always success even client exit with code not 0. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38270) SQL CLI AM should keep same exitcode with client
angerszhu created SPARK-38270: - Summary: SQL CLI AM should keep same exitcode with client Key: SPARK-38270 URL: https://issues.apache.org/jira/browse/SPARK-38270 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.1 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38215) InsertIntoHiveDir support convert metadata
angerszhu created SPARK-38215: - Summary: InsertIntoHiveDir support convert metadata Key: SPARK-38215 URL: https://issues.apache.org/jira/browse/SPARK-38215 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.1 Reporter: angerszhu Current InsertIntoHiveDir command use hive serde write data, con't supporot convert, cause such SQL can't write parquet with zstd. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38197) Improve error message of BlockManager.fetchRemoteManagedBuffer
angerszhu created SPARK-38197: - Summary: Improve error message of BlockManager.fetchRemoteManagedBuffer Key: SPARK-38197 URL: https://issues.apache.org/jira/browse/SPARK-38197 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.1 Reporter: angerszhu Fix For: 3.3.0 Some fetch failed message not show fetch information. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38150) Update comment of RelationConversions
angerszhu created SPARK-38150: - Summary: Update comment of RelationConversions Key: SPARK-38150 URL: https://issues.apache.org/jira/browse/SPARK-38150 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.1, 3.2.0 Reporter: angerszhu Fix For: 3.3.0 Current comment of RelationConversions is not correct -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35531) Can not insert into hive bucket table if create table with upper case schema
[ https://issues.apache.org/jira/browse/SPARK-35531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488612#comment-17488612 ] angerszhu commented on SPARK-35531: --- Sure. > Can not insert into hive bucket table if create table with upper case schema > > > Key: SPARK-35531 > URL: https://issues.apache.org/jira/browse/SPARK-35531 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.1.1, 3.2.0 >Reporter: Hongyi Zhang >Assignee: angerszhu >Priority: Major > Fix For: 3.3.0 > > > > > create table TEST1( > V1 BIGINT, > S1 INT) > partitioned by (PK BIGINT) > clustered by (V1) > sorted by (S1) > into 200 buckets > STORED AS PARQUET; > > insert into test1 > select > * from values(1,1,1); > > > org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not > part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), > FieldSchema(name:s1, type:int, comment:null)] > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not > part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), > FieldSchema(name:s1, type:int, comment:null)] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38043) Refactor FileDataSourceBaseSuite
angerszhu created SPARK-38043: - Summary: Refactor FileDataSourceBaseSuite Key: SPARK-38043 URL: https://issues.apache.org/jira/browse/SPARK-38043 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.0, 3.3.0 Reporter: angerszhu Refactor FileDataSourceBaseSuite to build a test frame for datasource -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37985) Fix flaky test SPARK-37578
angerszhu created SPARK-37985: - Summary: Fix flaky test SPARK-37578 Key: SPARK-37985 URL: https://issues.apache.org/jira/browse/SPARK-37985 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.0 Reporter: angerszhu 2022-01-22T01:58:29.8444339Z [0m[[0m[0minfo[0m] [0m[0m[32m- SPARK-36030: Report metrics from Datasource v2 write (90 milliseconds)[0m[0m 2022-01-22T01:58:29.9427049Z [0m[[0m[0minfo[0m] [0m[0m[31m- SPARK-37578: Update output metrics from Datasource v2 *** FAILED *** (65 milliseconds)[0m[0m 2022-01-22T01:58:29.9428038Z [0m[[0m[0minfo[0m] [0m[0m[31m 123 did not equal 246 (SQLAppStatusListenerSuite.scala:936)[0m[0m 2022-01-22T01:58:29.9428531Z [0m[[0m[0minfo[0m] [0m[0m[31m org.scalatest.exceptions.TestFailedException:[0m[0m 2022-01-22T01:58:29.9429101Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)[0m[0m 2022-01-22T01:58:29.9429717Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)[0m[0m 2022-01-22T01:58:29.9430298Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)[0m[0m 2022-01-22T01:58:29.9430840Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)[0m[0m 2022-01-22T01:58:29.9431512Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$61(SQLAppStatusListenerSuite.scala:936)[0m[0m 2022-01-22T01:58:29.9432305Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$61$adapted(SQLAppStatusListenerSuite.scala:905)[0m[0m 2022-01-22T01:58:29.9432982Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1(SQLTestUtils.scala:79)[0m[0m 2022-01-22T01:58:29.9433695Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1$adapted(SQLTestUtils.scala:78)[0m[0m 2022-01-22T01:58:29.9434276Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:221)[0m[0m 2022-01-22T01:58:29.9435040Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.org$apache$spark$sql$test$SQLTestUtils$$super$withTempDir(SQLAppStatusListenerSuite.scala:63)[0m[0m 2022-01-22T01:58:29.9435764Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.apache.spark.sql.test.SQLTestUtils.withTempDir(SQLTestUtils.scala:78)[0m[0m 2022-01-22T01:58:29.9436354Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.apache.spark.sql.test.SQLTestUtils.withTempDir$(SQLTestUtils.scala:77)[0m[0m 2022-01-22T01:58:29.9437063Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.withTempDir(SQLAppStatusListenerSuite.scala:63)[0m[0m 2022-01-22T01:58:29.9437851Z [0m[[0m[0minfo[0m] [0m[0m[31m at org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$60(SQLAppStatusListenerSuite.scala:905)[0m[0m -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org