[jira] [Updated] (SPARK-48292) Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status

2024-05-21 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-48292:
--
Summary: Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should 
abort stage when committed file not consistent with task status  (was: Improve 
stage failure reason message in OutputCommitCoordinator )

> Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage 
> when committed file not consistent with task status
> --
>
> Key: SPARK-48292
> URL: https://issues.apache.org/jira/browse/SPARK-48292
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: L. C. Hsieh
>Priority: Minor
>
> When a task attemp fails but it is authorized to do task commit, 
> OutputCommitCoordinator will make the stage failed with a reason message 
> which says that task commit success, but actually the driver never knows if a 
> task commit is successful or not. We should update the reason message to make 
> it less confused.
> See https://github.com/apache/spark/pull/36564#discussion_r1598660630



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48340) Support TimestampNTZ infer schema miss prefer_timestamp_ntz

2024-05-20 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-48340:
--
Attachment: image-2024-05-20-18-38-39-769.png

> Support TimestampNTZ  infer schema miss prefer_timestamp_ntz
> 
>
> Key: SPARK-48340
> URL: https://issues.apache.org/jira/browse/SPARK-48340
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0, 3.5.1
>Reporter: angerszhu
>Priority: Major
> Attachments: image-2024-05-20-18-38-39-769.png
>
>
> !image-2024-05-20-18-38-22-486.png|width=378,height=227!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48340) Support TimestampNTZ infer schema miss prefer_timestamp_ntz

2024-05-20 Thread angerszhu (Jira)
angerszhu created SPARK-48340:
-

 Summary: Support TimestampNTZ  infer schema miss 
prefer_timestamp_ntz
 Key: SPARK-48340
 URL: https://issues.apache.org/jira/browse/SPARK-48340
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.5.1, 4.0.0
Reporter: angerszhu
 Attachments: image-2024-05-20-18-38-39-769.png

!image-2024-05-20-18-38-22-486.png|width=378,height=227!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48340) Support TimestampNTZ infer schema miss prefer_timestamp_ntz

2024-05-20 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-48340:
--
Description: !image-2024-05-20-18-38-39-769.png|width=746,height=450!  
(was: !image-2024-05-20-18-38-22-486.png|width=378,height=227!)

> Support TimestampNTZ  infer schema miss prefer_timestamp_ntz
> 
>
> Key: SPARK-48340
> URL: https://issues.apache.org/jira/browse/SPARK-48340
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0, 3.5.1
>Reporter: angerszhu
>Priority: Major
> Attachments: image-2024-05-20-18-38-39-769.png
>
>
> !image-2024-05-20-18-38-39-769.png|width=746,height=450!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48265) Infer window group limit batch should do constant folding

2024-05-13 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-48265:
--
Description: 
{code:java}
24/05/13 17:39:25 ERROR [main] PlanChangeLogger:
=== Result of Batch LocalRelation ===
 GlobalLimit 21                                                                 
                                                            GlobalLimit 21
 +- LocalLimit 21                                                               
                                                            +- LocalLimit 21
!   +- Union false, false                                                       
                                                               +- LocalLimit 21
!      :- LocalLimit 21                                                         
                                                                  +- Project 
[item_id#647L]
!      :  +- Project [item_id#647L]                                             
                                                                     +- Filter 
(((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = 
BR)) AND isnotnull(grass_region#735))
!      :     +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND 
(grass_region#735 = BR)) AND isnotnull(grass_region#735))               +- 
Relation db.table[,... 91 more fields] parquet
!      :        +- Relation db.table[,... 91 more fields] parquet
!      +- LocalLimit 21
!         +- Project [item_id#738L]
!            +- LocalRelation , [, ... 91 more fields]
24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Check Cartesian Products 
has no effect.
24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch RewriteSubquery has no 
effect.
24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch NormalizeFloatingNumbers 
has no effect.
24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch 
ReplaceUpdateFieldsExpression has no effect.
24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Optimize Metadata Only 
Query has no effect.
24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch PartitionPruning has no 
effect.
24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch InjectRuntimeFilter has 
no effect.
24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Pushdown Filters from 
PartitionPruning has no effect.
24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Cleanup filters that 
cannot be pushed down has no effect.
24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Extract Python UDFs has 
no effect.
24/05/13 17:39:25 ERROR [main] PlanChangeLogger:
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.EliminateLimits ===
 GlobalLimit 21                                                                 
                                                         GlobalLimit 21
!+- LocalLimit 21                                                               
                                                         +- LocalLimit least(, 
... 2 more fields)
!   +- LocalLimit 21                                                            
                                                            +- Project 
[item_id#647L]
!      +- Project [item_id#647L]                                                
                                                               +- Filter 
(((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = 
BR)) AND isnotnull(grass_region#735))
!         +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND 
(grass_region#735 = BR)) AND isnotnull(grass_region#735))            +- 
Relation db.table[,... 91 more fields] parquet
!            +- Relation db.table[,... 91 more fields] parquet
 {code}

> Infer window group limit batch should do constant folding
> -
>
> Key: SPARK-48265
> URL: https://issues.apache.org/jira/browse/SPARK-48265
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: angerszhu
>Priority: Major
>
> {code:java}
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger:
> === Result of Batch LocalRelation ===
>  GlobalLimit 21                                                               
>                                                               GlobalLimit 21
>  +- LocalLimit 21                                                             
>                                                               +- LocalLimit 21
> !   +- Union false, false                                                     
>                                                                  +- 
> LocalLimit 21
> !      :- LocalLimit 21                                                       
>                                                                     +- 
> Project [item_id#647L]
> !      :  +- Project [item_id#647L]           

[jira] [Created] (SPARK-48265) Infer window group limit batch should do constant folding

2024-05-13 Thread angerszhu (Jira)
angerszhu created SPARK-48265:
-

 Summary: Infer window group limit batch should do constant folding
 Key: SPARK-48265
 URL: https://issues.apache.org/jira/browse/SPARK-48265
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.1, 4.0.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48155) PropagateEmpty relation cause LogicalQueryStage only with broadcast without join then execute failed

2024-05-06 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-48155:
--
Description: 
{code:java}
24/05/07 09:48:55 ERROR [main] PlanChangeLogger:
=== Applying Rule 
org.apache.spark.sql.execution.adaptive.AQEPropagateEmptyRelation ===
 Project [date#124, station_name#0, shipment_id#14]
 +- Filter (status#2L INSET 1, 149, 2, 36, 400, 417, 418, 419, 49, 5, 50, 581 
AND station_type#1 IN (3,12))
    +- Aggregate [date#124, shipment_id#14], [date#124, shipment_id#14, ... 3 
more fields] 
!      +- Join LeftOuter, ((cast(date#124 as timestamp) >= 
cast(from_unixtime((ctime#27L - 0), -MM-dd HH:mm:ss, Some(Asia/Singapore)) 
as timestamp)) AND (cast(date#124 as timestamp) + INTERVAL '-4' DAY <= 
cast(from_unixtime((ctime#27L - 0), -MM-dd HH:mm:ss, Some(Asia/Singapore)) 
as timestamp)))
!         :- LogicalQueryStage Generate 
explode(org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@3a191e40), 
false, [date#124], BroadcastQueryStage 0
!         +- LocalRelation , [shipment_id#14, station_name#5, ... 3 more 
fields]24/05/07 09:48:55 ERROR [main] 





Project [date#124, station_name#0, shipment_id#14]
 +- Filter (status#2L INSET 1, 149, 2, 36, 400, 417, 418, 419, 49, 5, 50, 581 
AND station_type#1 IN (3,12))
    +- Aggregate [date#124, shipment_id#14], [date#124, shipment_id#14, ... 3 
more fields]
!      +- Project [date#124, cast(null as string) AS shipment_id#14, ... 4 more 
fields]
!         +- LogicalQueryStage Generate 
explode(org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@3a191e40), 
false, [date#124], BroadcastQueryStage 0 {code}
{code:java}
java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
java.lang.UnsupportedOperationException: BroadcastExchange does not support the 
execute() code path.at 
org.apache.spark.sql.errors.QueryExecutionErrors$.executeCodePathUnsupportedError(QueryExecutionErrors.scala:1652)
at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecute(BroadcastExchangeExec.scala:203)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)  
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) 
   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180)
at 
org.apache.spark.sql.execution.adaptive.QueryStageExec.doExecute(QueryStageExec.scala:119)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)  
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) 
   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180)
at 
org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:526)
at 
org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:454)
at 
org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:453)
at 
org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:497)
at 
org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:50)
at org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:132)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:750)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)  
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) 
   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180)
at 
org.apache.spark.sql.execution.aggregate.SortAggregateExec.doExecute(SortAggregateExec.scala:55)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)  
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) 
   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180)
at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD$lzycompute(ShuffleExchangeExec.scala:144)
at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD(ShuffleExchangeExec.scala:144)
at 

[jira] [Created] (SPARK-48155) PropagateEmpty relation cause LogicalQueryStage only with broadcast without join then execute failed

2024-05-06 Thread angerszhu (Jira)
angerszhu created SPARK-48155:
-

 Summary: PropagateEmpty relation cause LogicalQueryStage only with 
broadcast without join then execute failed
 Key: SPARK-48155
 URL: https://issues.apache.org/jira/browse/SPARK-48155
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.4, 3.5.1, 3.2.1
Reporter: angerszhu


{code:java}
24/05/07 09:48:55 ERROR [main] PlanChangeLogger:
=== Applying Rule 
org.apache.spark.sql.execution.adaptive.AQEPropagateEmptyRelation ===
 Project [date#124, station_name#0, shipment_id#14]
 +- Filter (status#2L INSET 1, 149, 2, 36, 400, 417, 418, 419, 49, 5, 50, 581 
AND station_type#1 IN (3,12))
    +- Aggregate [date#124, shipment_id#14], [date#124, shipment_id#14, ... 3 
more fields] 
!      +- Join LeftOuter, ((cast(date#124 as timestamp) >= 
cast(from_unixtime((ctime#27L - 0), -MM-dd HH:mm:ss, Some(Asia/Singapore)) 
as timestamp)) AND (cast(date#124 as timestamp) + INTERVAL '-4' DAY <= 
cast(from_unixtime((ctime#27L - 0), -MM-dd HH:mm:ss, Some(Asia/Singapore)) 
as timestamp)))
!         :- LogicalQueryStage Generate 
explode(org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@3a191e40), 
false, [date#124], BroadcastQueryStage 0
!         +- LocalRelation , [shipment_id#14, station_name#5, ... 3 more 
fields]24/05/07 09:48:55 ERROR [main] PlanChangeLogger:
=== Applying Rule 
org.apache.spark.sql.execution.adaptive.AQEPropagateEmptyRelation ===
Project [date#124, station_name#0, shipment_id#14]
 +- Filter (status#2L INSET 1, 149, 2, 36, 400, 417, 418, 419, 49, 5, 50, 581 
AND station_type#1 IN (3,12))
    +- Aggregate [date#124, shipment_id#14], [date#124, shipment_id#14, ... 3 
more fields]
!      +- Project [date#124, cast(null as string) AS shipment_id#14, ... 4 more 
fields]
!         +- LogicalQueryStage Generate 
explode(org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@3a191e40), 
false, [date#124], BroadcastQueryStage 0 {code}
{code:java}
java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
java.lang.UnsupportedOperationException: BroadcastExchange does not support the 
execute() code path.at 
org.apache.spark.sql.errors.QueryExecutionErrors$.executeCodePathUnsupportedError(QueryExecutionErrors.scala:1652)
at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecute(BroadcastExchangeExec.scala:203)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)  
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) 
   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180)
at 
org.apache.spark.sql.execution.adaptive.QueryStageExec.doExecute(QueryStageExec.scala:119)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)  
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) 
   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180)
at 
org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:526)
at 
org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:454)
at 
org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:453)
at 
org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:497)
at 
org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:50)
at org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:132)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:750)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)  
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) 
   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:180)
at 
org.apache.spark.sql.execution.aggregate.SortAggregateExec.doExecute(SortAggregateExec.scala:55)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:222)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)  
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:219) 
   at 

[jira] [Updated] (SPARK-48027) InjectRuntimeFilter for multi-level join should check child join type

2024-04-28 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-48027:
--
Description: 
{code:java}
with 
refund_info as (
    select
        loan_id,
        1 as refund_type
    from
        default.table_b
    where grass_date = '2024-04-25'
       
),
next_month_time as (
    select /*+ broadcast(b, c) */
         loan_id
        ,1 as final_repayment_time
    FROM default.table_c
    where grass_date = '2024-04-25'
)
select
        a.loan_id
        ,c.final_repayment_time
        ,b.refund_type    from
        (select
             loan_id
        from
            default.table_a2
        where grass_date = '2024-04-25'
        select
            loan_id
        from
            default.table_a1
        where grass_date = '2024-04-24' 
        ) a
    left join
        refund_info b
    on a.loan_id = b.loan_id
    left join
        next_month_time c
    on a.loan_id = c.loan_id
;
 {code}
!image-2024-04-28-16-38-37-510.png|width=899,height=201!

 

In this query, it inject table_b as table_c's runtime filter, but table_b join 
condition is LEFT OUTER, causing table_c missing data.

Caused by 

InjectRuntimeFilter.extractSelectiveFilterOverScan(), when handle join, since 
left plan is a UNION< result is NONE, then zip l/r keys to extract from right. 
Then cause this issue

!image-2024-04-28-16-41-08-392.png|width=883,height=706!

  was:
{code:java}
with 
refund_info as (
    select
        loan_id,
        1 as refund_type
    from
        credit.table_b
    where grass_date = '2024-04-25'
       
),
next_month_time as (
    select /*+ broadcast(b, c) */
         loan_id
        ,1 as final_repayment_time
    FROM credit.table_c
    where grass_date = '2024-04-25'
)
select
        a.loan_id
        ,c.final_repayment_time
        ,b.refund_type    from
        (select
             loan_id
        from
            credit_fund.table_a2
        where grass_date = '2024-04-25' --当天新增卖出的loan        union all
        select
            loan_id
        from
            credit_fund.table_a1
        where grass_date = '2024-04-24' and loan_abs_status != 600 
--历史累计cutoff的loan, 过滤掉cutoff 后再次 revoke loan (状态流转 100 -> 600)
        ) a
    left join
        refund_info b
    on a.loan_id = b.loan_id
    left join
        next_month_time c
    on a.loan_id = c.loan_id
;
 {code}
!image-2024-04-28-16-38-37-510.png|width=899,height=201!

 

In this query, it inject table_b as table_c's runtime filter, but table_b join 
condition is LEFT OUTER, causing table_c missing data.

Caused by 

InjectRuntimeFilter.extractSelectiveFilterOverScan(), when handle join, since 
left plan is a UNION< result is NONE, then zip l/r keys to extract from right. 
Then cause this issue

!image-2024-04-28-16-41-08-392.png|width=883,height=706!


> InjectRuntimeFilter for multi-level join should check child join type
> -
>
> Key: SPARK-48027
> URL: https://issues.apache.org/jira/browse/SPARK-48027
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1, 3.4.3
>Reporter: angerszhu
>Priority: Major
> Attachments: image-2024-04-28-16-38-37-510.png, 
> image-2024-04-28-16-41-08-392.png
>
>
> {code:java}
> with 
> refund_info as (
>     select
>         loan_id,
>         1 as refund_type
>     from
>         default.table_b
>     where grass_date = '2024-04-25'
>        
> ),
> next_month_time as (
>     select /*+ broadcast(b, c) */
>          loan_id
>         ,1 as final_repayment_time
>     FROM default.table_c
>     where grass_date = '2024-04-25'
> )
> select
>         a.loan_id
>         ,c.final_repayment_time
>         ,b.refund_type    from
>         (select
>              loan_id
>         from
>             default.table_a2
>         where grass_date = '2024-04-25'
>         select
>             loan_id
>         from
>             default.table_a1
>         where grass_date = '2024-04-24' 
>         ) a
>     left join
>         refund_info b
>     on a.loan_id = b.loan_id
>     left join
>         next_month_time c
>     on a.loan_id = c.loan_id
> ;
>  {code}
> !image-2024-04-28-16-38-37-510.png|width=899,height=201!
>  
> In this query, it inject table_b as table_c's runtime filter, but table_b 
> join condition is LEFT OUTER, causing table_c missing data.
> Caused by 
> InjectRuntimeFilter.extractSelectiveFilterOverScan(), when handle join, since 
> left plan is a UNION< result is NONE, then zip l/r keys to extract from 
> right. Then cause this issue
> !image-2024-04-28-16-41-08-392.png|width=883,height=706!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional 

[jira] [Updated] (SPARK-48027) InjectRuntimeFilter for multi-level join should check child join type

2024-04-28 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-48027:
--
Description: 
{code:java}
with 
refund_info as (
    select
        loan_id,
        1 as refund_type
    from
        credit.table_b
    where grass_date = '2024-04-25'
       
),
next_month_time as (
    select /*+ broadcast(b, c) */
         loan_id
        ,1 as final_repayment_time
    FROM credit.table_c
    where grass_date = '2024-04-25'
)
select
        a.loan_id
        ,c.final_repayment_time
        ,b.refund_type    from
        (select
             loan_id
        from
            credit_fund.table_a2
        where grass_date = '2024-04-25' --当天新增卖出的loan        union all
        select
            loan_id
        from
            credit_fund.table_a1
        where grass_date = '2024-04-24' and loan_abs_status != 600 
--历史累计cutoff的loan, 过滤掉cutoff 后再次 revoke loan (状态流转 100 -> 600)
        ) a
    left join
        refund_info b
    on a.loan_id = b.loan_id
    left join
        next_month_time c
    on a.loan_id = c.loan_id
;
 {code}
!image-2024-04-28-16-38-37-510.png|width=899,height=201!

 

In this query, it inject table_b as table_c's runtime filter, but table_b join 
condition is LEFT OUTER, causing table_c missing data.

Caused by 

InjectRuntimeFilter.extractSelectiveFilterOverScan(), when handle join, since 
left plan is a UNION< result is NONE, then zip l/r keys to extract from right. 
Then cause this issue

!image-2024-04-28-16-41-08-392.png|width=883,height=706!

  was:
{code:java}
with 
refund_info as (
    select
        loan_id,
        1 as refund_type
    from
        credit.table_b
    where grass_date = '2024-04-25'
       
),
next_month_time as (
    select /*+ broadcast(b, c) */
         loan_id
        ,1 as final_repayment_time
    FROM credit.table_c
    where grass_date = '2024-04-25'
)
select
        a.loan_id
        ,c.final_repayment_time
        ,b.refund_type    from
        (select
             loan_id
        from
            credit_fund.table_a2
        where grass_date = '2024-04-25' --当天新增卖出的loan        union all
        select
            loan_id
        from
            credit_fund.table_a1
        where grass_date = '2024-04-24' and loan_abs_status != 600 
--历史累计cutoff的loan, 过滤掉cutoff 后再次 revoke loan (状态流转 100 -> 600)
        ) a
    left join
        refund_info b
    on a.loan_id = b.loan_id
    left join
        next_month_time c
    on a.loan_id = c.loan_id
;
 {code}
!image-2024-04-28-16-38-37-510.png|width=899,height=201!


> InjectRuntimeFilter for multi-level join should check child join type
> -
>
> Key: SPARK-48027
> URL: https://issues.apache.org/jira/browse/SPARK-48027
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1, 3.4.3
>Reporter: angerszhu
>Priority: Major
> Attachments: image-2024-04-28-16-38-37-510.png, 
> image-2024-04-28-16-41-08-392.png
>
>
> {code:java}
> with 
> refund_info as (
>     select
>         loan_id,
>         1 as refund_type
>     from
>         credit.table_b
>     where grass_date = '2024-04-25'
>        
> ),
> next_month_time as (
>     select /*+ broadcast(b, c) */
>          loan_id
>         ,1 as final_repayment_time
>     FROM credit.table_c
>     where grass_date = '2024-04-25'
> )
> select
>         a.loan_id
>         ,c.final_repayment_time
>         ,b.refund_type    from
>         (select
>              loan_id
>         from
>             credit_fund.table_a2
>         where grass_date = '2024-04-25' --当天新增卖出的loan        union all
>         select
>             loan_id
>         from
>             credit_fund.table_a1
>         where grass_date = '2024-04-24' and loan_abs_status != 600 
> --历史累计cutoff的loan, 过滤掉cutoff 后再次 revoke loan (状态流转 100 -> 600)
>         ) a
>     left join
>         refund_info b
>     on a.loan_id = b.loan_id
>     left join
>         next_month_time c
>     on a.loan_id = c.loan_id
> ;
>  {code}
> !image-2024-04-28-16-38-37-510.png|width=899,height=201!
>  
> In this query, it inject table_b as table_c's runtime filter, but table_b 
> join condition is LEFT OUTER, causing table_c missing data.
> Caused by 
> InjectRuntimeFilter.extractSelectiveFilterOverScan(), when handle join, since 
> left plan is a UNION< result is NONE, then zip l/r keys to extract from 
> right. Then cause this issue
> !image-2024-04-28-16-41-08-392.png|width=883,height=706!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48027) InjectRuntimeFilter for multi-level join should check child join type

2024-04-28 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-48027:
--
Attachment: image-2024-04-28-16-41-08-392.png

> InjectRuntimeFilter for multi-level join should check child join type
> -
>
> Key: SPARK-48027
> URL: https://issues.apache.org/jira/browse/SPARK-48027
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1, 3.4.3
>Reporter: angerszhu
>Priority: Major
> Attachments: image-2024-04-28-16-38-37-510.png, 
> image-2024-04-28-16-41-08-392.png
>
>
> {code:java}
> with 
> refund_info as (
>     select
>         loan_id,
>         1 as refund_type
>     from
>         credit.table_b
>     where grass_date = '2024-04-25'
>        
> ),
> next_month_time as (
>     select /*+ broadcast(b, c) */
>          loan_id
>         ,1 as final_repayment_time
>     FROM credit.table_c
>     where grass_date = '2024-04-25'
> )
> select
>         a.loan_id
>         ,c.final_repayment_time
>         ,b.refund_type    from
>         (select
>              loan_id
>         from
>             credit_fund.table_a2
>         where grass_date = '2024-04-25' --当天新增卖出的loan        union all
>         select
>             loan_id
>         from
>             credit_fund.table_a1
>         where grass_date = '2024-04-24' and loan_abs_status != 600 
> --历史累计cutoff的loan, 过滤掉cutoff 后再次 revoke loan (状态流转 100 -> 600)
>         ) a
>     left join
>         refund_info b
>     on a.loan_id = b.loan_id
>     left join
>         next_month_time c
>     on a.loan_id = c.loan_id
> ;
>  {code}
> !image-2024-04-28-16-38-37-510.png|width=899,height=201!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48027) InjectRuntimeFilter for multi-level join should check child join type

2024-04-28 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-48027:
--
Description: 
{code:java}
with 
refund_info as (
    select
        loan_id,
        1 as refund_type
    from
        credit.table_b
    where grass_date = '2024-04-25'
       
),
next_month_time as (
    select /*+ broadcast(b, c) */
         loan_id
        ,1 as final_repayment_time
    FROM credit.table_c
    where grass_date = '2024-04-25'
)
select
        a.loan_id
        ,c.final_repayment_time
        ,b.refund_type    from
        (select
             loan_id
        from
            credit_fund.table_a2
        where grass_date = '2024-04-25' --当天新增卖出的loan        union all
        select
            loan_id
        from
            credit_fund.table_a1
        where grass_date = '2024-04-24' and loan_abs_status != 600 
--历史累计cutoff的loan, 过滤掉cutoff 后再次 revoke loan (状态流转 100 -> 600)
        ) a
    left join
        refund_info b
    on a.loan_id = b.loan_id
    left join
        next_month_time c
    on a.loan_id = c.loan_id
;
 {code}
!image-2024-04-28-16-38-37-510.png|width=899,height=201!

> InjectRuntimeFilter for multi-level join should check child join type
> -
>
> Key: SPARK-48027
> URL: https://issues.apache.org/jira/browse/SPARK-48027
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1, 3.4.3
>Reporter: angerszhu
>Priority: Major
> Attachments: image-2024-04-28-16-38-37-510.png
>
>
> {code:java}
> with 
> refund_info as (
>     select
>         loan_id,
>         1 as refund_type
>     from
>         credit.table_b
>     where grass_date = '2024-04-25'
>        
> ),
> next_month_time as (
>     select /*+ broadcast(b, c) */
>          loan_id
>         ,1 as final_repayment_time
>     FROM credit.table_c
>     where grass_date = '2024-04-25'
> )
> select
>         a.loan_id
>         ,c.final_repayment_time
>         ,b.refund_type    from
>         (select
>              loan_id
>         from
>             credit_fund.table_a2
>         where grass_date = '2024-04-25' --当天新增卖出的loan        union all
>         select
>             loan_id
>         from
>             credit_fund.table_a1
>         where grass_date = '2024-04-24' and loan_abs_status != 600 
> --历史累计cutoff的loan, 过滤掉cutoff 后再次 revoke loan (状态流转 100 -> 600)
>         ) a
>     left join
>         refund_info b
>     on a.loan_id = b.loan_id
>     left join
>         next_month_time c
>     on a.loan_id = c.loan_id
> ;
>  {code}
> !image-2024-04-28-16-38-37-510.png|width=899,height=201!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48027) InjectRuntimeFilter for multi-level join should check child join type

2024-04-28 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-48027:
--
Attachment: image-2024-04-28-16-38-37-510.png

> InjectRuntimeFilter for multi-level join should check child join type
> -
>
> Key: SPARK-48027
> URL: https://issues.apache.org/jira/browse/SPARK-48027
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1, 3.4.3
>Reporter: angerszhu
>Priority: Major
> Attachments: image-2024-04-28-16-38-37-510.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48027) InjectRuntimeFilter for multi-level join should check child join type

2024-04-28 Thread angerszhu (Jira)
angerszhu created SPARK-48027:
-

 Summary: InjectRuntimeFilter for multi-level join should check 
child join type
 Key: SPARK-48027
 URL: https://issues.apache.org/jira/browse/SPARK-48027
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.3, 3.5.1, 4.0.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33826) InsertIntoHiveTable generate HDFS file with invalid user

2024-04-21 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839449#comment-17839449
 ] 

angerszhu commented on SPARK-33826:
---

What is RIK? Not sure what do you mean

> InsertIntoHiveTable generate HDFS file with invalid user
> 
>
> Key: SPARK-33826
> URL: https://issues.apache.org/jira/browse/SPARK-33826
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2, 3.0.0
>Reporter: Zhang Jianguo
>Priority: Minor
>
> *Arch:* Hive on Spark.
>  
> *Version:* Spark 2.3.2
>  
> *Conf:*
> Enable user impersonation
> hive.server2.enable.doAs=true
>  
> *Scenario:*
> Thriftserver is running with loginUser A, and Task  run as User A too.
> Client execute SQL with user B
>  
> Data generated by sql "insert into TABLE  \[tbl\] select XXX form ." is 
> written to HDFS on executor, executor doesn't know B.
>  
> *{color:#de350b}So the user file written to HDFS will be user A which should 
> be B.{color}*
>  
> I also check the inplementation of Spark 3.0.0, It could have the same issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47294) OptimizeSkewInRebalanceRepartitions should support ProjectExec(_,ShuffleQueryStageExec)

2024-03-05 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu resolved SPARK-47294.
---
Resolution: Not A Problem

> OptimizeSkewInRebalanceRepartitions should support 
> ProjectExec(_,ShuffleQueryStageExec)
> ---
>
> Key: SPARK-47294
> URL: https://issues.apache.org/jira/browse/SPARK-47294
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: angerszhu
>Priority: Major
>  Labels: pull-request-available
>
> Current OptimizeSkewInRebalanceRepartitions only support match 
> ShuffleQueryStageExec, this case only support SQL query, can't work when 
> insert since there have a project between ShuffleQueryStageExec and insert 
> command
> {code:java}
> plan transformUp {
>   case p @ ProjectExec(_, stage: ShuffleQueryStageExec) if 
> isSupported(stage.shuffle) =>
> p.copy(child = tryOptimizeSkewedPartitions(stage))
>   case stage: ShuffleQueryStageExec if isSupported(stage.shuffle) =>
> tryOptimizeSkewedPartitions(stage)
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47294) OptimizeSkewInRebalanceRepartitions should support ProjectExec(_,ShuffleQueryStageExec)

2024-03-05 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-47294:
--
Description: 
Current OptimizeSkewInRebalanceRepartitions only support match 
ShuffleQueryStageExec, this case only support SQL query, can't work when insert 
since there have a project between ShuffleQueryStageExec and insert command
{code:java}
plan transformUp {
  case p @ ProjectExec(_, stage: ShuffleQueryStageExec) if 
isSupported(stage.shuffle) =>
p.copy(child = tryOptimizeSkewedPartitions(stage))
  case stage: ShuffleQueryStageExec if isSupported(stage.shuffle) =>
tryOptimizeSkewedPartitions(stage)
} {code}

> OptimizeSkewInRebalanceRepartitions should support 
> ProjectExec(_,ShuffleQueryStageExec)
> ---
>
> Key: SPARK-47294
> URL: https://issues.apache.org/jira/browse/SPARK-47294
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: angerszhu
>Priority: Major
>
> Current OptimizeSkewInRebalanceRepartitions only support match 
> ShuffleQueryStageExec, this case only support SQL query, can't work when 
> insert since there have a project between ShuffleQueryStageExec and insert 
> command
> {code:java}
> plan transformUp {
>   case p @ ProjectExec(_, stage: ShuffleQueryStageExec) if 
> isSupported(stage.shuffle) =>
> p.copy(child = tryOptimizeSkewedPartitions(stage))
>   case stage: ShuffleQueryStageExec if isSupported(stage.shuffle) =>
> tryOptimizeSkewedPartitions(stage)
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47294) OptimizeSkewInRebalanceRepartitions should support ProjectExec(_,ShuffleQueryStageExec)

2024-03-05 Thread angerszhu (Jira)
angerszhu created SPARK-47294:
-

 Summary: OptimizeSkewInRebalanceRepartitions should support 
ProjectExec(_,ShuffleQueryStageExec)
 Key: SPARK-47294
 URL: https://issues.apache.org/jira/browse/SPARK-47294
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.1, 4.0.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46741) CacheTable AsSelect should inherit from CTEInChildren to make sure it can be matched

2024-01-16 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-46741:
--
Description: 
Current code since CaheTableAsSelelct not inherit CETInChildren,  still return 
forceInline.  Make cache result can't be matched

!image-2024-01-17-11-48-28-867.png|width=859,height=363!

> CacheTable AsSelect should inherit from CTEInChildren to make sure it can be 
> matched
> 
>
> Key: SPARK-46741
> URL: https://issues.apache.org/jira/browse/SPARK-46741
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: angerszhu
>Priority: Major
> Attachments: image-2024-01-17-11-48-28-867.png
>
>
> Current code since CaheTableAsSelelct not inherit CETInChildren,  still 
> return forceInline.  Make cache result can't be matched
> !image-2024-01-17-11-48-28-867.png|width=859,height=363!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46741) CacheTable AsSelect should inherit from CTEInChildren to make sure it can be matched

2024-01-16 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-46741:
--
Attachment: image-2024-01-17-11-48-28-867.png

> CacheTable AsSelect should inherit from CTEInChildren to make sure it can be 
> matched
> 
>
> Key: SPARK-46741
> URL: https://issues.apache.org/jira/browse/SPARK-46741
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: angerszhu
>Priority: Major
> Attachments: image-2024-01-17-11-48-28-867.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46741) CacheTable AsSelect should inherit from CTEInChildren to make sure it can be matched

2024-01-16 Thread angerszhu (Jira)
angerszhu created SPARK-46741:
-

 Summary: CacheTable AsSelect should inherit from CTEInChildren to 
make sure it can be matched
 Key: SPARK-46741
 URL: https://issues.apache.org/jira/browse/SPARK-46741
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46034) SparkContext add file should also copy file to local root path

2023-11-21 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-46034:
--
Description: 
For below case, it failed with FileNotFoundException
{code:java}
add jar hdfs://path/search_hadoop_udf-1.0.0-SNAPSHOT.jar;
add file hdfs://path/feature_map.txt;

CREATE or replace TEMPORARY FUNCTION si_to_fn AS 
"com.shopee.deep.data_mart.udf.SlotIdToFeatName";

select si_to_fn(k, './feature_map.txt') as feat_name
from (
select 'slot_8116' as k
union all
select 'slot_2219' as k)
 A; {code}
It was called that user use valued-table then the task will running on driver, 
but driver didn't copy this file to the root path, then failed.

> SparkContext add file should also copy file to local root path
> --
>
> Key: SPARK-46034
> URL: https://issues.apache.org/jira/browse/SPARK-46034
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: angerszhu
>Priority: Major
>
> For below case, it failed with FileNotFoundException
> {code:java}
> add jar hdfs://path/search_hadoop_udf-1.0.0-SNAPSHOT.jar;
> add file hdfs://path/feature_map.txt;
> CREATE or replace TEMPORARY FUNCTION si_to_fn AS 
> "com.shopee.deep.data_mart.udf.SlotIdToFeatName";
> select si_to_fn(k, './feature_map.txt') as feat_name
> from (
> select 'slot_8116' as k
> union all
> select 'slot_2219' as k)
>  A; {code}
> It was called that user use valued-table then the task will running on 
> driver, but driver didn't copy this file to the root path, then failed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46034) SparkContext add file should also copy file to local root path

2023-11-21 Thread angerszhu (Jira)
angerszhu created SPARK-46034:
-

 Summary: SparkContext add file should also copy file to local root 
path
 Key: SPARK-46034
 URL: https://issues.apache.org/jira/browse/SPARK-46034
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.2.1
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46006) YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop

2023-11-20 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-46006:
--
Description: 
We meet a case that user call sc.stop() after run all custom code, but stuck in 
some place. 

Cause below situation
 # User call sc.stop()
 # sc.stop() stuck in some process, but SchedulerBackend.stop was called
 # Since tarn ApplicationMaster didn't finish, still call 
YarnAllocator.allocateResources()
 # Since driver endpoint stop new allocated executor failed to register
 # untll trigger Max number of executor failures

Caused by 

Before call CoarseGrainedSchedulerBackend.stop() will call 
YarnSchedulerBackend.requestTotalExecutor() to clean request info

!image-2023-11-20-17-56-56-507.png|width=898,height=297!

 

>From the log we make sure that CoarseGrainedSchedulerBackend.stop()  was called

 

 

When YarnAllocator handle then empty resource request,  since 
resourceTotalExecutorsWithPreferedLocalities is empty, miss clean 
targetNumExecutorsPerResourceProfileId.

!image-2023-11-20-17-56-45-212.png|width=708,height=379!

 

 

  was:
We meet a case that user call sc.stop() after run all custom code, but stuck in 
some place. 

Cause below situation
 # User call sc.stop()
 # sc.stop() stuck in some process, but SchedulerBackend.stop was called
 # Since tarn ApplicationMaster didn't finish, still call 
YarnAllocator.allocateResources()
因为driver端口已经关闭,allocateResource还在继续申请新的executor失败
触发 Max number of executor failures


> YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after 
> YarnSchedulerBackend call stop
> 
>
> Key: SPARK-46006
> URL: https://issues.apache.org/jira/browse/SPARK-46006
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.1.3, 3.2.4, 3.3.2, 3.4.1, 3.5.0
>Reporter: angerszhu
>Priority: Major
> Fix For: 3.4.2, 4.0.0, 3.5.1
>
> Attachments: image-2023-11-20-17-56-45-212.png, 
> image-2023-11-20-17-56-56-507.png
>
>
> We meet a case that user call sc.stop() after run all custom code, but stuck 
> in some place. 
> Cause below situation
>  # User call sc.stop()
>  # sc.stop() stuck in some process, but SchedulerBackend.stop was called
>  # Since tarn ApplicationMaster didn't finish, still call 
> YarnAllocator.allocateResources()
>  # Since driver endpoint stop new allocated executor failed to register
>  # untll trigger Max number of executor failures
> Caused by 
> Before call CoarseGrainedSchedulerBackend.stop() will call 
> YarnSchedulerBackend.requestTotalExecutor() to clean request info
> !image-2023-11-20-17-56-56-507.png|width=898,height=297!
>  
> From the log we make sure that CoarseGrainedSchedulerBackend.stop()  was 
> called
>  
>  
> When YarnAllocator handle then empty resource request,  since 
> resourceTotalExecutorsWithPreferedLocalities is empty, miss clean 
> targetNumExecutorsPerResourceProfileId.
> !image-2023-11-20-17-56-45-212.png|width=708,height=379!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46006) YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop

2023-11-20 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-46006:
--
Attachment: image-2023-11-20-17-56-56-507.png

> YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after 
> YarnSchedulerBackend call stop
> 
>
> Key: SPARK-46006
> URL: https://issues.apache.org/jira/browse/SPARK-46006
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.1.3, 3.2.4, 3.3.2, 3.4.1, 3.5.0
>Reporter: angerszhu
>Priority: Major
> Fix For: 3.4.2, 4.0.0, 3.5.1
>
> Attachments: image-2023-11-20-17-56-45-212.png, 
> image-2023-11-20-17-56-56-507.png
>
>
> We meet a case that user call sc.stop() after run all custom code, but stuck 
> in some place. 
> Cause below situation
>  # User call sc.stop()
>  # sc.stop() stuck in some process, but SchedulerBackend.stop was called
>  # Since tarn ApplicationMaster didn't finish, still call 
> YarnAllocator.allocateResources()
> 因为driver端口已经关闭,allocateResource还在继续申请新的executor失败
> 触发 Max number of executor failures



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46006) YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop

2023-11-20 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-46006:
--
Attachment: image-2023-11-20-17-56-45-212.png

> YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after 
> YarnSchedulerBackend call stop
> 
>
> Key: SPARK-46006
> URL: https://issues.apache.org/jira/browse/SPARK-46006
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.1.3, 3.2.4, 3.3.2, 3.4.1, 3.5.0
>Reporter: angerszhu
>Priority: Major
> Fix For: 3.4.2, 4.0.0, 3.5.1
>
> Attachments: image-2023-11-20-17-56-45-212.png, 
> image-2023-11-20-17-56-56-507.png
>
>
> We meet a case that user call sc.stop() after run all custom code, but stuck 
> in some place. 
> Cause below situation
>  # User call sc.stop()
>  # sc.stop() stuck in some process, but SchedulerBackend.stop was called
>  # Since tarn ApplicationMaster didn't finish, still call 
> YarnAllocator.allocateResources()
> 因为driver端口已经关闭,allocateResource还在继续申请新的executor失败
> 触发 Max number of executor failures



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46006) YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop

2023-11-20 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-46006:
--
Description: 
We meet a case that user call sc.stop() after run all custom code, but stuck in 
some place. 

Cause below situation
 # User call sc.stop()
 # sc.stop() stuck in some process, but SchedulerBackend.stop was called
 # Since tarn ApplicationMaster didn't finish, still call 
YarnAllocator.allocateResources()
因为driver端口已经关闭,allocateResource还在继续申请新的executor失败
触发 Max number of executor failures

  was:
We meet a case that user call sc.stop() after run all custom code, but stuck in 
some place. 

Cause below situation
 # User call sc.stop()
 # sc.stop() stuck in some process, but SchedulerBackend.stop was called
ApplicationMaster 还没有进入finish, 还会继续调用YarnAllocator.allocateResources()
因为driver端口已经关闭,allocateResource还在继续申请新的executor失败
触发 Max number of executor failures


> YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after 
> YarnSchedulerBackend call stop
> 
>
> Key: SPARK-46006
> URL: https://issues.apache.org/jira/browse/SPARK-46006
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.1.3, 3.2.4, 3.3.2, 3.4.1, 3.5.0
>Reporter: angerszhu
>Priority: Major
> Fix For: 3.4.2, 4.0.0, 3.5.1
>
>
> We meet a case that user call sc.stop() after run all custom code, but stuck 
> in some place. 
> Cause below situation
>  # User call sc.stop()
>  # sc.stop() stuck in some process, but SchedulerBackend.stop was called
>  # Since tarn ApplicationMaster didn't finish, still call 
> YarnAllocator.allocateResources()
> 因为driver端口已经关闭,allocateResource还在继续申请新的executor失败
> 触发 Max number of executor failures



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46006) YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop

2023-11-20 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-46006:
--
Description: 
We meet a case that user call sc.stop() after run all custom code, but stuck in 
some place. 

Cause below situation
 # User call sc.stop()
 # sc.stop() stuck in some process, but SchedulerBackend.stop was called
ApplicationMaster 还没有进入finish, 还会继续调用YarnAllocator.allocateResources()
因为driver端口已经关闭,allocateResource还在继续申请新的executor失败
触发 Max number of executor failures

  was:
We meet a case that user call sc.stop() after run all custom code, but stuck in 
some place. 

Cause below situation
 # User call sc.stop()
 # 
sc.stop()卡住
ApplicationMaster 还没有进入finish, 还会继续调用YarnAllocator.allocateResources()
因为driver端口已经关闭,allocateResource还在继续申请新的executor失败
触发 Max number of executor failures


> YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after 
> YarnSchedulerBackend call stop
> 
>
> Key: SPARK-46006
> URL: https://issues.apache.org/jira/browse/SPARK-46006
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.1.3, 3.2.4, 3.3.2, 3.4.1, 3.5.0
>Reporter: angerszhu
>Priority: Major
> Fix For: 3.4.2, 4.0.0, 3.5.1
>
>
> We meet a case that user call sc.stop() after run all custom code, but stuck 
> in some place. 
> Cause below situation
>  # User call sc.stop()
>  # sc.stop() stuck in some process, but SchedulerBackend.stop was called
> ApplicationMaster 还没有进入finish, 还会继续调用YarnAllocator.allocateResources()
> 因为driver端口已经关闭,allocateResource还在继续申请新的executor失败
> 触发 Max number of executor failures



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46006) YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop

2023-11-20 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-46006:
--
Description: 
We meet a case that user call sc.stop() after run all custom code, but stuck in 
some place. 

Cause below situation
 # User call sc.stop()
 # 
sc.stop()卡住
ApplicationMaster 还没有进入finish, 还会继续调用YarnAllocator.allocateResources()
因为driver端口已经关闭,allocateResource还在继续申请新的executor失败
触发 Max number of executor failures

> YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after 
> YarnSchedulerBackend call stop
> 
>
> Key: SPARK-46006
> URL: https://issues.apache.org/jira/browse/SPARK-46006
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.1.3, 3.2.4, 3.3.2, 3.4.1, 3.5.0
>Reporter: angerszhu
>Priority: Major
> Fix For: 3.4.2, 4.0.0, 3.5.1
>
>
> We meet a case that user call sc.stop() after run all custom code, but stuck 
> in some place. 
> Cause below situation
>  # User call sc.stop()
>  # 
> sc.stop()卡住
> ApplicationMaster 还没有进入finish, 还会继续调用YarnAllocator.allocateResources()
> 因为driver端口已经关闭,allocateResource还在继续申请新的executor失败
> 触发 Max number of executor failures



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46006) YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop

2023-11-20 Thread angerszhu (Jira)
angerszhu created SPARK-46006:
-

 Summary: YarnAllocator miss clean 
targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop
 Key: SPARK-46006
 URL: https://issues.apache.org/jira/browse/SPARK-46006
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 3.5.0, 3.4.1, 3.3.2, 3.2.4, 3.1.3
Reporter: angerszhu
 Fix For: 3.4.2, 4.0.0, 3.5.1






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43064) Spark SQL CLI SQL tab should only show once statement once

2023-04-07 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-43064:
--
Description: !screenshot-1.png|width=996,height=554!  (was:  
!screenshot-1.png! )

> Spark SQL CLI SQL tab should only show once statement once
> --
>
> Key: SPARK-43064
> URL: https://issues.apache.org/jira/browse/SPARK-43064
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: angerszhu
>Priority: Major
> Attachments: screenshot-1.png
>
>
> !screenshot-1.png|width=996,height=554!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43064) Spark SQL CLI SQL tab should only show once statement once

2023-04-07 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-43064:
--
Description:  !screenshot-1.png! 

> Spark SQL CLI SQL tab should only show once statement once
> --
>
> Key: SPARK-43064
> URL: https://issues.apache.org/jira/browse/SPARK-43064
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: angerszhu
>Priority: Major
> Attachments: screenshot-1.png
>
>
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43064) Spark SQL CLI SQL tab should only show once statement once

2023-04-07 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-43064:
--
Attachment: screenshot-1.png

> Spark SQL CLI SQL tab should only show once statement once
> --
>
> Key: SPARK-43064
> URL: https://issues.apache.org/jira/browse/SPARK-43064
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: angerszhu
>Priority: Major
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43064) Spark SQL CLI SQL tab should only show once statement once

2023-04-07 Thread angerszhu (Jira)
angerszhu created SPARK-43064:
-

 Summary: Spark SQL CLI SQL tab should only show once statement once
 Key: SPARK-43064
 URL: https://issues.apache.org/jira/browse/SPARK-43064
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42699) SparkConnectServer should make client and AM same exit code

2023-03-07 Thread angerszhu (Jira)
angerszhu created SPARK-42699:
-

 Summary: SparkConnectServer should make client and AM same exit 
code
 Key: SPARK-42699
 URL: https://issues.apache.org/jira/browse/SPARK-42699
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, Spark Core
Affects Versions: 3.5.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42698) Client mode submit task client should keep same exitcode with AM

2023-03-07 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-42698:
--
Parent: SPARK-36623
Issue Type: Sub-task  (was: Bug)

> Client mode submit task client should keep same exitcode with AM
> 
>
> Key: SPARK-42698
> URL: https://issues.apache.org/jira/browse/SPARK-42698
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.5.0
>Reporter: angerszhu
>Priority: Major
>
> ```
> try {
>   app.start(childArgs.toArray, sparkConf)
> } catch {
>   case t: Throwable =>
> throw findCause(t)
> } finally {
>   if (!isShell(args.primaryResource) && !isSqlShell(args.mainClass) &&
> !isThriftServer(args.mainClass)) {
> try {
>   SparkContext.getActive.foreach(_.stop())
> } catch {
>   case e: Throwable => logError(s"Failed to close SparkContext: $e")
> }
>   }
> }
>   }
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42698) Client mode submit task client should keep same exitcode with AM

2023-03-07 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-42698:
--
Description: 
```
try {
  app.start(childArgs.toArray, sparkConf)
} catch {
  case t: Throwable =>
throw findCause(t)
} finally {
  if (!isShell(args.primaryResource) && !isSqlShell(args.mainClass) &&
!isThriftServer(args.mainClass)) {
try {
  SparkContext.getActive.foreach(_.stop())
} catch {
  case e: Throwable => logError(s"Failed to close SparkContext: $e")
}
  }
}
  }
```

> Client mode submit task client should keep same exitcode with AM
> 
>
> Key: SPARK-42698
> URL: https://issues.apache.org/jira/browse/SPARK-42698
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.5.0
>Reporter: angerszhu
>Priority: Major
>
> ```
> try {
>   app.start(childArgs.toArray, sparkConf)
> } catch {
>   case t: Throwable =>
> throw findCause(t)
> } finally {
>   if (!isShell(args.primaryResource) && !isSqlShell(args.mainClass) &&
> !isThriftServer(args.mainClass)) {
> try {
>   SparkContext.getActive.foreach(_.stop())
> } catch {
>   case e: Throwable => logError(s"Failed to close SparkContext: $e")
> }
>   }
> }
>   }
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42698) Client mode submit task client should keep same exitcode with AM

2023-03-07 Thread angerszhu (Jira)
angerszhu created SPARK-42698:
-

 Summary: Client mode submit task client should keep same exitcode 
with AM
 Key: SPARK-42698
 URL: https://issues.apache.org/jira/browse/SPARK-42698
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 3.5.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40019) Refactor comment of ArrayType

2022-08-09 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-40019:
--
Summary: Refactor comment of ArrayType  (was: Refactor comment of ArrayType 
and MapType)

> Refactor comment of ArrayType
> -
>
> Key: SPARK-40019
> URL: https://issues.apache.org/jira/browse/SPARK-40019
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0, 3.2.2
>Reporter: angerszhu
>Priority: Major
>
> Now the parameter `containsNull` of ArrayType/MapType is so confused, need to 
> add comment 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40019) Refactor comment of ArrayType and MapType

2022-08-09 Thread angerszhu (Jira)
angerszhu created SPARK-40019:
-

 Summary: Refactor comment of ArrayType and MapType
 Key: SPARK-40019
 URL: https://issues.apache.org/jira/browse/SPARK-40019
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.2, 3.3.0
Reporter: angerszhu


Now the parameter `containsNull` of ArrayType/MapType is so confused, need to 
add comment 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39776) Join‘ verbose string didn't contains JoinType

2022-07-14 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-39776:
--
Description: 
 Current verbose string don't have joinType


{code:java}
(5) BroadcastHashJoin [codegen id : 8]
Left keys [1]: [ss_sold_date_sk#3]
Right keys [1]: [d_date_sk#5]
Join condition: None
{code}

{code:java}
override def verboseStringWithOperatorId(): String = {
val joinCondStr = if (condition.isDefined) {
  s"${condition.get}"
} else "None"
if (leftKeys.nonEmpty || rightKeys.nonEmpty) {
  s"""
 |$formattedNodeName
 |${ExplainUtils.generateFieldString("Left keys", leftKeys)}
 |${ExplainUtils.generateFieldString("Right keys", rightKeys)}
 |${ExplainUtils.generateFieldString("Join condition", joinCondStr)}
 |""".stripMargin
} else {
  s"""
 |$formattedNodeName
 |${ExplainUtils.generateFieldString("Join condition", joinCondStr)}
 |""".stripMargin
}
  }
{code}


  was:
 Current verbose string don't have joinType
{code:java}
override def verboseStringWithOperatorId(): String = {
val joinCondStr = if (condition.isDefined) {
  s"${condition.get}"
} else "None"
if (leftKeys.nonEmpty || rightKeys.nonEmpty) {
  s"""
 |$formattedNodeName
 |${ExplainUtils.generateFieldString("Left keys", leftKeys)}
 |${ExplainUtils.generateFieldString("Right keys", rightKeys)}
 |${ExplainUtils.generateFieldString("Join condition", joinCondStr)}
 |""".stripMargin
} else {
  s"""
 |$formattedNodeName
 |${ExplainUtils.generateFieldString("Join condition", joinCondStr)}
 |""".stripMargin
}
  }
{code}



> Join‘ verbose string didn't contains JoinType
> -
>
> Key: SPARK-39776
> URL: https://issues.apache.org/jira/browse/SPARK-39776
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: angerszhu
>Priority: Major
>
>  Current verbose string don't have joinType
> {code:java}
> (5) BroadcastHashJoin [codegen id : 8]
> Left keys [1]: [ss_sold_date_sk#3]
> Right keys [1]: [d_date_sk#5]
> Join condition: None
> {code}
> {code:java}
> override def verboseStringWithOperatorId(): String = {
> val joinCondStr = if (condition.isDefined) {
>   s"${condition.get}"
> } else "None"
> if (leftKeys.nonEmpty || rightKeys.nonEmpty) {
>   s"""
>  |$formattedNodeName
>  |${ExplainUtils.generateFieldString("Left keys", leftKeys)}
>  |${ExplainUtils.generateFieldString("Right keys", rightKeys)}
>  |${ExplainUtils.generateFieldString("Join condition", joinCondStr)}
>  |""".stripMargin
> } else {
>   s"""
>  |$formattedNodeName
>  |${ExplainUtils.generateFieldString("Join condition", joinCondStr)}
>  |""".stripMargin
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39776) Join‘ verbose string didn't contains JoinType

2022-07-14 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-39776:
--
Description: 
 Current verbose string don't have joinType
{code:java}
override def verboseStringWithOperatorId(): String = {
val joinCondStr = if (condition.isDefined) {
  s"${condition.get}"
} else "None"
if (leftKeys.nonEmpty || rightKeys.nonEmpty) {
  s"""
 |$formattedNodeName
 |${ExplainUtils.generateFieldString("Left keys", leftKeys)}
 |${ExplainUtils.generateFieldString("Right keys", rightKeys)}
 |${ExplainUtils.generateFieldString("Join condition", joinCondStr)}
 |""".stripMargin
} else {
  s"""
 |$formattedNodeName
 |${ExplainUtils.generateFieldString("Join condition", joinCondStr)}
 |""".stripMargin
}
  }
{code}


  was:
·
{code:java}
override def verboseStringWithOperatorId(): String = {
val joinCondStr = if (condition.isDefined) {
  s"${condition.get}"
} else "None"
if (leftKeys.nonEmpty || rightKeys.nonEmpty) {
  s"""
 |$formattedNodeName
 |${ExplainUtils.generateFieldString("Left keys", leftKeys)}
 |${ExplainUtils.generateFieldString("Right keys", rightKeys)}
 |${ExplainUtils.generateFieldString("Join condition", joinCondStr)}
 |""".stripMargin
} else {
  s"""
 |$formattedNodeName
 |${ExplainUtils.generateFieldString("Join condition", joinCondStr)}
 |""".stripMargin
}
  }
{code}



> Join‘ verbose string didn't contains JoinType
> -
>
> Key: SPARK-39776
> URL: https://issues.apache.org/jira/browse/SPARK-39776
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: angerszhu
>Priority: Major
>
>  Current verbose string don't have joinType
> {code:java}
> override def verboseStringWithOperatorId(): String = {
> val joinCondStr = if (condition.isDefined) {
>   s"${condition.get}"
> } else "None"
> if (leftKeys.nonEmpty || rightKeys.nonEmpty) {
>   s"""
>  |$formattedNodeName
>  |${ExplainUtils.generateFieldString("Left keys", leftKeys)}
>  |${ExplainUtils.generateFieldString("Right keys", rightKeys)}
>  |${ExplainUtils.generateFieldString("Join condition", joinCondStr)}
>  |""".stripMargin
> } else {
>   s"""
>  |$formattedNodeName
>  |${ExplainUtils.generateFieldString("Join condition", joinCondStr)}
>  |""".stripMargin
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39776) Join‘ verbose string didn't contains JoinType

2022-07-14 Thread angerszhu (Jira)
angerszhu created SPARK-39776:
-

 Summary: Join‘ verbose string didn't contains JoinType
 Key: SPARK-39776
 URL: https://issues.apache.org/jira/browse/SPARK-39776
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
Reporter: angerszhu


·
{code:java}
override def verboseStringWithOperatorId(): String = {
val joinCondStr = if (condition.isDefined) {
  s"${condition.get}"
} else "None"
if (leftKeys.nonEmpty || rightKeys.nonEmpty) {
  s"""
 |$formattedNodeName
 |${ExplainUtils.generateFieldString("Left keys", leftKeys)}
 |${ExplainUtils.generateFieldString("Right keys", rightKeys)}
 |${ExplainUtils.generateFieldString("Join condition", joinCondStr)}
 |""".stripMargin
} else {
  s"""
 |$formattedNodeName
 |${ExplainUtils.generateFieldString("Join condition", joinCondStr)}
 |""".stripMargin
}
  }
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39527) V2Catalog rename not support newIdent with catalog

2022-06-20 Thread angerszhu (Jira)
angerszhu created SPARK-39527:
-

 Summary: V2Catalog rename not support newIdent with catalog
 Key: SPARK-39527
 URL: https://issues.apache.org/jira/browse/SPARK-39527
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
Reporter: angerszhu



{code:java}

  test("rename a table") {
sql("ALTER TABLE h2.test.empty_table RENAME TO h2.test.empty_table2")
checkAnswer(
  sql("SHOW TABLES IN h2.test"),
  Seq(Row("test", "empty_table2")))
  }
{code}


{code:java}
[info] - rename a table *** FAILED *** (2 seconds, 358 milliseconds)
[info]   org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException: 
Failed table renaming from test.empty_table to h2.test.empty_table2
[info]   at 
org.apache.spark.sql.jdbc.H2Dialect$.classifyException(H2Dialect.scala:117)
[info]   at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.classifyException(JdbcUtils.scala:1176)
[info]   at 
org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog.$anonfun$renameTable$1(JDBCTableCatalog.scala:102)
[info]   at 
org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog.$anonfun$renameTable$1$adapted(JDBCTableCatalog.scala:100)
[info]   at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.withConnection(JdbcUtils.scala:1184)
[info]   at 
org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog.renameTable(JDBCTableCatalog.scala:100)
[info]   at 
org.apache.spark.sql.execution.datasources.v2.RenameTableExec.run(RenameTableExec.scala:51)
[info]   at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
[info]   at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
[info]   at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
[info]   at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
[info]   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:111)
[info]   at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:171)
[info]   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
[info]   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
[info]   at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
[info]   at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
[info]   at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
[info]   at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
[info]   at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
[info]   at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
[info]   at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
[info]   at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
[info]   at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
[info]   at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
[info]   at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
[info]   at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
[info]   at 
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
[info]   at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
[info]   at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
[info]   at org.apache.spark.sql.Dataset.(Dataset.scala:220)
[info]   at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
{code}





--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39400) spark-sql remain hive resource download dir after exit

2022-06-07 Thread angerszhu (Jira)
angerszhu created SPARK-39400:
-

 Summary: spark-sql remain hive resource download dir after exit
 Key: SPARK-39400
 URL: https://issues.apache.org/jira/browse/SPARK-39400
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
Reporter: angerszhu
 Fix For: 3.4.0



{code:java}
drwxrwxr-x  2 yi.zhu   yi.zhu4096 Jun  7 18:06 
da92eec4-2db1-4941-9e53-b28c38e25e31_resources
drwxrwxr-x  2 yi.zhu   yi.zhu4096 Jun  7 18:14 
dad364e8-ed1d-4ced-a6df-4897361c69b1_resources
drwxrwxr-x  2 yi.zhu   yi.zhu4096 Jun  7 18:13 
ee0a2ee7-ff3e-4346-9181-e8e491b1ca15_resources
drwxr-xr-x  2 yi.zhu   yi.zhu4096 Jun  7 18:16 
hsperfdata_yi.zhu

{code}




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39351) ShowCreateTable should redact properties

2022-05-31 Thread angerszhu (Jira)
angerszhu created SPARK-39351:
-

 Summary: ShowCreateTable should redact properties
 Key: SPARK-39351
 URL: https://issues.apache.org/jira/browse/SPARK-39351
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: angerszhu


ShowCreateTable should redact properties



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39350) DescribeNamespace should redact properties

2022-05-31 Thread angerszhu (Jira)
angerszhu created SPARK-39350:
-

 Summary: DescribeNamespace should redact properties
 Key: SPARK-39350
 URL: https://issues.apache.org/jira/browse/SPARK-39350
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: angerszhu


DescribeNamespace should redact properties



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37609) Transient StackOverflowError on DataFrame from Catalyst QueryPlan

2022-05-31 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17544321#comment-17544321
 ] 

angerszhu commented on SPARK-37609:
---

Increase -Xss  can resolve this. But we should better to refactor the current  
code...

> Transient StackOverflowError on DataFrame from Catalyst QueryPlan
> -
>
> Key: SPARK-37609
> URL: https://issues.apache.org/jira/browse/SPARK-37609
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.1.2
> Environment: py:3.9
>Reporter: Rafal Wojdyla
>Priority: Major
>
> I sporadically observe a StackOverflowError from Catalyst's QueryPlan (for a 
> relatively complicated query), below is a stacktrace from the {{count}} on 
> that DF.  It's a bit troubling because it's a transient error, with enough 
> retries (no change to code, probably some kind of cache?), I can get the op 
> to work :(
> {noformat}
> ---
> Py4JJavaError Traceback (most recent call last)
> ~/miniconda3/envs/tr-dev/lib/python3.9/site-packages/pyspark/sql/dataframe.py 
> in count(self)
> 662 2
> 663 """
> --> 664 return int(self._jdf.count())
> 665 
> 666 def collect(self):
> ~/miniconda3/envs/tr-dev/lib/python3.9/site-packages/py4j/java_gateway.py in 
> __call__(self, *args)
>1302 
>1303 answer = self.gateway_client.send_command(command)
> -> 1304 return_value = get_return_value(
>1305 answer, self.gateway_client, self.target_id, self.name)
>1306 
> ~/miniconda3/envs/tr-dev/lib/python3.9/site-packages/pyspark/sql/utils.py in 
> deco(*a, **kw)
> 109 def deco(*a, **kw):
> 110 try:
> --> 111 return f(*a, **kw)
> 112 except py4j.protocol.Py4JJavaError as e:
> 113 converted = convert_exception(e.java_exception)
> ~/miniconda3/envs/tr-dev/lib/python3.9/site-packages/py4j/protocol.py in 
> get_return_value(answer, gateway_client, target_id, name)
> 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
> 325 if answer[1] == REFERENCE_TYPE:
> --> 326 raise Py4JJavaError(
> 327 "An error occurred while calling {0}{1}{2}.\n".
> 328 format(target_id, ".", name), value)
> Py4JJavaError: An error occurred while calling o9123.count.
> : java.lang.StackOverflowError
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:188)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37609) Transient StackOverflowError on DataFrame from Catalyst QueryPlan

2022-05-31 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17544281#comment-17544281
 ] 

angerszhu commented on SPARK-37609:
---

[~yumwang]Seems just an very complex table schema. Not reproduce every time. I 
am tell user to try  increase -Xss, to see if this way can resolve this probelm.

> Transient StackOverflowError on DataFrame from Catalyst QueryPlan
> -
>
> Key: SPARK-37609
> URL: https://issues.apache.org/jira/browse/SPARK-37609
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.1.2
> Environment: py:3.9
>Reporter: Rafal Wojdyla
>Priority: Major
>
> I sporadically observe a StackOverflowError from Catalyst's QueryPlan (for a 
> relatively complicated query), below is a stacktrace from the {{count}} on 
> that DF.  It's a bit troubling because it's a transient error, with enough 
> retries (no change to code, probably some kind of cache?), I can get the op 
> to work :(
> {noformat}
> ---
> Py4JJavaError Traceback (most recent call last)
> ~/miniconda3/envs/tr-dev/lib/python3.9/site-packages/pyspark/sql/dataframe.py 
> in count(self)
> 662 2
> 663 """
> --> 664 return int(self._jdf.count())
> 665 
> 666 def collect(self):
> ~/miniconda3/envs/tr-dev/lib/python3.9/site-packages/py4j/java_gateway.py in 
> __call__(self, *args)
>1302 
>1303 answer = self.gateway_client.send_command(command)
> -> 1304 return_value = get_return_value(
>1305 answer, self.gateway_client, self.target_id, self.name)
>1306 
> ~/miniconda3/envs/tr-dev/lib/python3.9/site-packages/pyspark/sql/utils.py in 
> deco(*a, **kw)
> 109 def deco(*a, **kw):
> 110 try:
> --> 111 return f(*a, **kw)
> 112 except py4j.protocol.Py4JJavaError as e:
> 113 converted = convert_exception(e.java_exception)
> ~/miniconda3/envs/tr-dev/lib/python3.9/site-packages/py4j/protocol.py in 
> get_return_value(answer, gateway_client, target_id, name)
> 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
> 325 if answer[1] == REFERENCE_TYPE:
> --> 326 raise Py4JJavaError(
> 327 "An error occurred while calling {0}{1}{2}.\n".
> 328 format(target_id, ".", name), value)
> Py4JJavaError: An error occurred while calling o9123.count.
> : java.lang.StackOverflowError
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:188)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-37609) Transient StackOverflowError on DataFrame from Catalyst QueryPlan

2022-05-31 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17544192#comment-17544192
 ] 

angerszhu edited comment on SPARK-37609 at 5/31/22 7:43 AM:


Same error in spark-3.1, query is simple, but so many nested columns, sometimes 
run into stackoverflow.

{code:java}
22/05/26 15:26:48 ERROR ApplicationMaster: User class threw exception: 
java.lang.StackOverflowError
java.lang.StackOverflowError
at scala.collection.TraversableOnce.nonEmpty(TraversableOnce.scala:114)
at scala.collection.TraversableOnce.nonEmpty$(TraversableOnce.scala:114)
at scala.collection.AbstractTraversable.nonEmpty(Traversable.scala:108)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:358)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)

[jira] [Comment Edited] (SPARK-37609) Transient StackOverflowError on DataFrame from Catalyst QueryPlan

2022-05-31 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17544192#comment-17544192
 ] 

angerszhu edited comment on SPARK-37609 at 5/31/22 7:42 AM:


Same error in spark-3.1, query is simple, but so many nested columns.

{code:java}
22/05/26 15:26:48 ERROR ApplicationMaster: User class threw exception: 
java.lang.StackOverflowError
java.lang.StackOverflowError
at scala.collection.TraversableOnce.nonEmpty(TraversableOnce.scala:114)
at scala.collection.TraversableOnce.nonEmpty$(TraversableOnce.scala:114)
at scala.collection.AbstractTraversable.nonEmpty(Traversable.scala:108)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:358)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
at 

[jira] [Commented] (SPARK-37609) Transient StackOverflowError on DataFrame from Catalyst QueryPlan

2022-05-31 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17544192#comment-17544192
 ] 

angerszhu commented on SPARK-37609:
---

Same error in spark-3.1

{code:java}
22/05/26 15:26:48 ERROR ApplicationMaster: User class threw exception: 
java.lang.StackOverflowError
java.lang.StackOverflowError
at scala.collection.TraversableOnce.nonEmpty(TraversableOnce.scala:114)
at scala.collection.TraversableOnce.nonEmpty$(TraversableOnce.scala:114)
at scala.collection.AbstractTraversable.nonEmpty(Traversable.scala:108)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:358)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:408)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:406)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:359)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.rewrite$1(QueryPlan.scala:192)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformUpWithNewOutput$1(QueryPlan.scala:193)
at 

[jira] [Created] (SPARK-39343) DescribeTableExec should redact properties

2022-05-30 Thread angerszhu (Jira)
angerszhu created SPARK-39343:
-

 Summary: DescribeTableExec should redact properties
 Key: SPARK-39343
 URL: https://issues.apache.org/jira/browse/SPARK-39343
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: angerszhu


DescribeTableExec should redact properties



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39342) ShowTablePropertiesCommand/ShowTablePropertiesExec should redact properties.

2022-05-30 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-39342:
--
Summary: ShowTablePropertiesCommand/ShowTablePropertiesExec should redact 
properties.  (was: ShowTablePropertiesCommand should redact properties.)

> ShowTablePropertiesCommand/ShowTablePropertiesExec should redact properties.
> 
>
> Key: SPARK-39342
> URL: https://issues.apache.org/jira/browse/SPARK-39342
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: angerszhu
>Priority: Major
>
> ShowTablePropertiesCommand should redact properties.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39342) ShowTablePropertiesCommand should redact properties.

2022-05-30 Thread angerszhu (Jira)
angerszhu created SPARK-39342:
-

 Summary: ShowTablePropertiesCommand should redact properties.
 Key: SPARK-39342
 URL: https://issues.apache.org/jira/browse/SPARK-39342
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: angerszhu


ShowTablePropertiesCommand should redact properties.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39337) Refactor DescribeTableExec

2022-05-29 Thread angerszhu (Jira)
angerszhu created SPARK-39337:
-

 Summary: Refactor DescribeTableExec
 Key: SPARK-39337
 URL: https://issues.apache.org/jira/browse/SPARK-39337
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.3.0
Reporter: angerszhu


Repeated code, refactor the code.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39337) Refactor DescribeTableExec

2022-05-29 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-39337:
--
Description: 
Repeated code, refactor the code.


{code:java}

  private def addTableDetails(rows: ArrayBuffer[InternalRow]): Unit = {
rows += emptyRow()
rows += toCatalystRow("# Detailed Table Information", "", "")
rows += toCatalystRow("Name", table.name(), "")

CatalogV2Util.TABLE_RESERVED_PROPERTIES.foreach(propKey => {
  if (table.properties.containsKey(propKey)) {
rows += toCatalystRow(propKey.capitalize, 
table.properties.get(propKey), "")
  }
})
val properties =
  table.properties.asScala.toList
.filter(kv => !CatalogV2Util.TABLE_RESERVED_PROPERTIES.contains(kv._1))
.sortBy(_._1).map {
case (key, value) => key + "=" + value
  }.mkString("[", ",", "]")
rows += toCatalystRow("Table Properties", properties, "")
  }
{code}


  was:Repeated code, refactor the code.


> Refactor DescribeTableExec
> --
>
> Key: SPARK-39337
> URL: https://issues.apache.org/jira/browse/SPARK-39337
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: angerszhu
>Priority: Major
>
> Repeated code, refactor the code.
> {code:java}
>   private def addTableDetails(rows: ArrayBuffer[InternalRow]): Unit = {
> rows += emptyRow()
> rows += toCatalystRow("# Detailed Table Information", "", "")
> rows += toCatalystRow("Name", table.name(), "")
> CatalogV2Util.TABLE_RESERVED_PROPERTIES.foreach(propKey => {
>   if (table.properties.containsKey(propKey)) {
> rows += toCatalystRow(propKey.capitalize, 
> table.properties.get(propKey), "")
>   }
> })
> val properties =
>   table.properties.asScala.toList
> .filter(kv => 
> !CatalogV2Util.TABLE_RESERVED_PROPERTIES.contains(kv._1))
> .sortBy(_._1).map {
> case (key, value) => key + "=" + value
>   }.mkString("[", ",", "]")
> rows += toCatalystRow("Table Properties", properties, "")
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39335) DescribeTableCommand should redact properties

2022-05-29 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-39335:
--
Summary: DescribeTableCommand should redact properties  (was: Redact table 
should redact properties)

> DescribeTableCommand should redact properties
> -
>
> Key: SPARK-39335
> URL: https://issues.apache.org/jira/browse/SPARK-39335
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: angerszhu
>Priority: Major
>
> Now we only redact storage properties when  desc table, for normal properties 
> should redact too.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39336) Redact table/partition properties

2022-05-29 Thread angerszhu (Jira)
angerszhu created SPARK-39336:
-

 Summary: Redact table/partition properties
 Key: SPARK-39336
 URL: https://issues.apache.org/jira/browse/SPARK-39336
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.3.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39335) Redact table should redact properties

2022-05-29 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-39335:
--
Parent: SPARK-39336
Issue Type: Sub-task  (was: Task)

> Redact table should redact properties
> -
>
> Key: SPARK-39335
> URL: https://issues.apache.org/jira/browse/SPARK-39335
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: angerszhu
>Priority: Major
>
> Now we only redact storage properties when  desc table, for normal properties 
> should redact too.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39335) Redact table should redact properties

2022-05-29 Thread angerszhu (Jira)
angerszhu created SPARK-39335:
-

 Summary: Redact table should redact properties
 Key: SPARK-39335
 URL: https://issues.apache.org/jira/browse/SPARK-39335
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.3.0
Reporter: angerszhu


Now we only redact storage properties when  desc table, for normal properties 
should redact too.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27442) ParquetFileFormat fails to read column named with invalid characters

2022-05-25 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-27442:
--
Parent: SPARK-36200
Issue Type: Sub-task  (was: Bug)

> ParquetFileFormat fails to read column named with invalid characters
> 
>
> Key: SPARK-27442
> URL: https://issues.apache.org/jira/browse/SPARK-27442
> Project: Spark
>  Issue Type: Sub-task
>  Components: Input/Output
>Affects Versions: 2.0.0, 2.4.1
>Reporter: Jan Vršovský
>Assignee: angerszhu
>Priority: Minor
> Fix For: 3.3.0
>
>
> When reading a parquet file which contains characters considered invalid, the 
> reader fails with exception:
> Name: org.apache.spark.sql.AnalysisException
> Message: Attribute name "..." contains invalid character(s) among " 
> ,;{}()\n\t=". Please use alias to rename it.
> Spark should not be able to write such files, but it should be able to read 
> it (and allow the user to correct it). However, possible workarounds (such as 
> using alias to rename the column, or forcing another schema) do not work, 
> since the check is done on the input.
> (Possible fix: remove superficial 
> {{ParquetWriteSupport.setSchema(requiredSchema, hadoopConf)}} from 
> {{buildReaderWithPartitionValues}} ?)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39285) Spark should not check filed name when read data

2022-05-25 Thread angerszhu (Jira)
angerszhu created SPARK-39285:
-

 Summary: Spark should not check filed name when read data
 Key: SPARK-39285
 URL: https://issues.apache.org/jira/browse/SPARK-39285
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: angerszhu


Spark should not check read data when read data



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39224) Ignore noisy. warning message in ProcfsMetricsGetter

2022-05-18 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-39224:
--
Summary: Ignore noisy. warning message in ProcfsMetricsGetter  (was: Ignore 
noisy. warning message in ProcessMetricsGetter)

> Ignore noisy. warning message in ProcfsMetricsGetter
> 
>
> Key: SPARK-39224
> URL: https://issues.apache.org/jira/browse/SPARK-39224
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: angerszhu
>Priority: Major
>
> There are many user complain about the noisy warning messages in 
> ProcessMetricsGetter
> {code:java}
> 22/05/18 16:48:58 WARN ProcfsMetricsGetter: There was a problem with reading 
> the stat file of the process.
> java.io.FileNotFoundException: /proc/50371/stat (No such file or directory)
>   at java.io.FileInputStream.open0(Native Method)
>   at java.io.FileInputStream.open(FileInputStream.java:195)
>   at java.io.FileInputStream.(FileInputStream.java:138)
>   at 
> org.apache.spark.executor.ProcfsMetricsGetter.openReader$1(ProcfsMetricsGetter.scala:174)
>   at 
> org.apache.spark.executor.ProcfsMetricsGetter.$anonfun$addProcfsMetricsFromOneProcess$1(ProcfsMetricsGetter.scala:176)
>   at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2647)
>   at 
> org.apache.spark.executor.ProcfsMetricsGetter.addProcfsMetricsFromOneProcess(ProcfsMetricsGetter.scala:176)
>   at 
> org.apache.spark.executor.ProcfsMetricsGetter.$anonfun$computeAllMetrics$1(ProcfsMetricsGetter.scala:216)
>   at 
> scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
>   at scala.collection.immutable.Set$Set2.foreach(Set.scala:132)
>   at 
> org.apache.spark.executor.ProcfsMetricsGetter.computeAllMetrics(ProcfsMetricsGetter.scala:214)
>   at 
> org.apache.spark.metrics.ProcessTreeMetrics$.getMetricValues(ExecutorMetricType.scala:93)
>   at 
> org.apache.spark.executor.ExecutorMetrics$.$anonfun$getCurrentMetrics$1(ExecutorMetrics.scala:103)
>   at 
> org.apache.spark.executor.ExecutorMetrics$.$anonfun$getCurrentMetrics$1$adapted(ExecutorMetrics.scala:102)
>   at scala.collection.Iterator.foreach(Iterator.scala:941)
>   at scala.collection.Iterator.foreach$(Iterator.scala:941)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>   at 
> org.apache.spark.executor.ExecutorMetrics$.getCurrentMetrics(ExecutorMetrics.scala:102)
>   at 
> org.apache.spark.SparkContext.reportHeartBeat(SparkContext.scala:2578)
>   at org.apache.spark.SparkContext.$anonfun$new$30(SparkContext.scala:587)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2022)
>   at org.apache.spark.Heartbeater$$anon$1.run(Heartbeater.scala:46)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39224) Ignore noisy. warning message in ProcessMetricsGetter

2022-05-18 Thread angerszhu (Jira)
angerszhu created SPARK-39224:
-

 Summary: Ignore noisy. warning message in ProcessMetricsGetter
 Key: SPARK-39224
 URL: https://issues.apache.org/jira/browse/SPARK-39224
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.3.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39224) Ignore noisy. warning message in ProcessMetricsGetter

2022-05-18 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-39224:
--
Description: 
There are many user complain about the noisy warning messages in 
ProcessMetricsGetter

{code:java}
22/05/18 16:48:58 WARN ProcfsMetricsGetter: There was a problem with reading 
the stat file of the process.
java.io.FileNotFoundException: /proc/50371/stat (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at 
org.apache.spark.executor.ProcfsMetricsGetter.openReader$1(ProcfsMetricsGetter.scala:174)
at 
org.apache.spark.executor.ProcfsMetricsGetter.$anonfun$addProcfsMetricsFromOneProcess$1(ProcfsMetricsGetter.scala:176)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2647)
at 
org.apache.spark.executor.ProcfsMetricsGetter.addProcfsMetricsFromOneProcess(ProcfsMetricsGetter.scala:176)
at 
org.apache.spark.executor.ProcfsMetricsGetter.$anonfun$computeAllMetrics$1(ProcfsMetricsGetter.scala:216)
at 
scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
at scala.collection.immutable.Set$Set2.foreach(Set.scala:132)
at 
org.apache.spark.executor.ProcfsMetricsGetter.computeAllMetrics(ProcfsMetricsGetter.scala:214)
at 
org.apache.spark.metrics.ProcessTreeMetrics$.getMetricValues(ExecutorMetricType.scala:93)
at 
org.apache.spark.executor.ExecutorMetrics$.$anonfun$getCurrentMetrics$1(ExecutorMetrics.scala:103)
at 
org.apache.spark.executor.ExecutorMetrics$.$anonfun$getCurrentMetrics$1$adapted(ExecutorMetrics.scala:102)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at 
org.apache.spark.executor.ExecutorMetrics$.getCurrentMetrics(ExecutorMetrics.scala:102)
at 
org.apache.spark.SparkContext.reportHeartBeat(SparkContext.scala:2578)
at org.apache.spark.SparkContext.$anonfun$new$30(SparkContext.scala:587)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2022)
at org.apache.spark.Heartbeater$$anon$1.run(Heartbeater.scala:46)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
{code}


> Ignore noisy. warning message in ProcessMetricsGetter
> -
>
> Key: SPARK-39224
> URL: https://issues.apache.org/jira/browse/SPARK-39224
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: angerszhu
>Priority: Major
>
> There are many user complain about the noisy warning messages in 
> ProcessMetricsGetter
> {code:java}
> 22/05/18 16:48:58 WARN ProcfsMetricsGetter: There was a problem with reading 
> the stat file of the process.
> java.io.FileNotFoundException: /proc/50371/stat (No such file or directory)
>   at java.io.FileInputStream.open0(Native Method)
>   at java.io.FileInputStream.open(FileInputStream.java:195)
>   at java.io.FileInputStream.(FileInputStream.java:138)
>   at 
> org.apache.spark.executor.ProcfsMetricsGetter.openReader$1(ProcfsMetricsGetter.scala:174)
>   at 
> org.apache.spark.executor.ProcfsMetricsGetter.$anonfun$addProcfsMetricsFromOneProcess$1(ProcfsMetricsGetter.scala:176)
>   at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2647)
>   at 
> org.apache.spark.executor.ProcfsMetricsGetter.addProcfsMetricsFromOneProcess(ProcfsMetricsGetter.scala:176)
>   at 
> org.apache.spark.executor.ProcfsMetricsGetter.$anonfun$computeAllMetrics$1(ProcfsMetricsGetter.scala:216)
>   at 
> scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
>   at scala.collection.immutable.Set$Set2.foreach(Set.scala:132)
>   at 
> org.apache.spark.executor.ProcfsMetricsGetter.computeAllMetrics(ProcfsMetricsGetter.scala:214)
>   at 
> org.apache.spark.metrics.ProcessTreeMetrics$.getMetricValues(ExecutorMetricType.scala:93)
>   at 
> org.apache.spark.executor.ExecutorMetrics$.$anonfun$getCurrentMetrics$1(ExecutorMetrics.scala:103)
>   at 
> org.apache.spark.executor.ExecutorMetrics$.$anonfun$getCurrentMetrics$1$adapted(ExecutorMetrics.scala:102)
>   at 

[jira] [Created] (SPARK-39195) Spark should use two step update of outputCommitCoordinator

2022-05-16 Thread angerszhu (Jira)
angerszhu created SPARK-39195:
-

 Summary: Spark should use two step update of 
outputCommitCoordinator
 Key: SPARK-39195
 URL: https://issues.apache.org/jira/browse/SPARK-39195
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.3.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39178) When throw SparkFatalException, should show root cause too.

2022-05-13 Thread angerszhu (Jira)
angerszhu created SPARK-39178:
-

 Summary: When throw SparkFatalException, should show root cause 
too.
 Key: SPARK-39178
 URL: https://issues.apache.org/jira/browse/SPARK-39178
 Project: Spark
  Issue Type: Task
  Components: Spark Core
Affects Versions: 3.2.1, 3.3.0
Reporter: angerszhu
 Fix For: 3.4.0


We have a query throw SparkFatalException without root cause.

{code:java}
at 
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:163)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at 
org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:525)
at 
org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:453)
at 
org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:452)
at 
org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:496)
at 
org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:50)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:746)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at 
org.apache.spark.sql.execution.ProjectExec.doExecute(basicPhysicalOperators.scala:92)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at org.apache.spark.sql.execution.SortExec.doExecute(SortExec.scala:112)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:222)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188)
at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:120)
at 
org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
at 
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)
at org.apache.spark.sql.Dataset.(Dataset.scala:228)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
  

[jira] [Created] (SPARK-39136) JDBCTable support properties

2022-05-10 Thread angerszhu (Jira)
angerszhu created SPARK-39136:
-

 Summary: JDBCTable support properties
 Key: SPARK-39136
 URL: https://issues.apache.org/jira/browse/SPARK-39136
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.3.0
Reporter: angerszhu
 Fix For: 3.4.0



{code:java}
 >
 > desc formatted jdbc.test.people;
NAMEstring
ID  int

# Partitioning
Not partitioned

# Detailed Table Information
Nametest.people
Table Properties[]
Time taken: 0.048 seconds, Fetched 9 row(s)
{code}




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39136) JDBCTable support properties

2022-05-10 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534220#comment-17534220
 ] 

angerszhu commented on SPARK-39136:
---

Raise a ticket soon

> JDBCTable support properties
> 
>
> Key: SPARK-39136
> URL: https://issues.apache.org/jira/browse/SPARK-39136
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: angerszhu
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
>  >
>  > desc formatted jdbc.test.people;
> NAME  string
> IDint
> # Partitioning
> Not partitioned
> # Detailed Table Information
> Name  test.people
> Table Properties  []
> Time taken: 0.048 seconds, Fetched 9 row(s)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39110) Add metrics properties to Environment page

2022-05-05 Thread angerszhu (Jira)
angerszhu created SPARK-39110:
-

 Summary: Add metrics properties to Environment page
 Key: SPARK-39110
 URL: https://issues.apache.org/jira/browse/SPARK-39110
 Project: Spark
  Issue Type: Task
  Components: Web UI
Affects Versions: 3.3.0
Reporter: angerszhu


We have different ways to load metrics configuration, user may not sure which 
one is real work, we can add this to environment tab.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39043) Hive client should not gather statistic by default.

2022-04-27 Thread angerszhu (Jira)
angerszhu created SPARK-39043:
-

 Summary: Hive client should not gather statistic by default.
 Key: SPARK-39043
 URL: https://issues.apache.org/jira/browse/SPARK-39043
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.0, 3.1.2, 3.3.0
Reporter: angerszhu


When use `InsertIntoHiveTable`, when insert overwrite partition, it will call
Hive.loadPartition(), in this method, when `hive.stats.autogather` is 
true(default is true)

 

{code:java}
// Some comments here
public String getFoo()
  if (oldPart == null) {
newTPart.getTPartition().setParameters(new HashMap());
if (this.getConf().getBoolVar(HiveConf.ConfVars.HIVESTATSAUTOGATHER)) {
  
StatsSetupConst.setBasicStatsStateForCreateTable(newTPart.getParameters(),
  StatsSetupConst.TRUE);
}

public static void setBasicStatsStateForCreateTable(Map params, 
String setting) {
  if (TRUE.equals(setting)) {
for (String stat : StatsSetupConst.supportedStats) {
  params.put(stat, "0");
}
  }
  setBasicStatsState(params, setting);
} 

public static final String[] supportedStats = 
{NUM_FILES,ROW_COUNT,TOTAL_SIZE,RAW_DATA_SIZE};
{code}




Then it set default rowNum as 0, but since spark will update numFiles and 
rawSize, so rowNum remain 0.

This impact other system like presto's CBO.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38910) Clean sparkStaging dir should before unregister()

2022-04-15 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-38910:
--
Description: 
{code:java}
  ShutdownHookManager.addShutdownHook(priority) { () =>
try {
  val maxAppAttempts = client.getMaxRegAttempts(sparkConf, yarnConf)
  val isLastAttempt = appAttemptId.getAttemptId() >= maxAppAttempts

  if (!finished) {
// The default state of ApplicationMaster is failed if it is 
invoked by shut down hook.
// This behavior is different compared to 1.x version.
// If user application is exited ahead of time by calling 
System.exit(N), here mark
// this application as failed with EXIT_EARLY. For a good shutdown, 
user shouldn't call
// System.exit(0) to terminate the application.
finish(finalStatus,
  ApplicationMaster.EXIT_EARLY,
  "Shutdown hook called before final status was reported.")
  }

  if (!unregistered) {
// we only want to unregister if we don't want the RM to retry
if (finalStatus == FinalApplicationStatus.SUCCEEDED || 
isLastAttempt) {
  unregister(finalStatus, finalMsg)
  cleanupStagingDir(new 
Path(System.getenv("SPARK_YARN_STAGING_DIR")))
}
  }
} catch {
  case e: Throwable =>
logWarning("Ignoring Exception while stopping ApplicationMaster 
from shutdown hook", e)
}
  }{code}

unregister may throw exception, clean staging dir should before unregister.

  was:
{code:java}

{code}


Not clean the staging dir when match case 
{code:jave}
!launcherBackend.isConnected() && fireAndForget
{code}


> Clean sparkStaging dir should before unregister()
> -
>
> Key: SPARK-38910
> URL: https://issues.apache.org/jira/browse/SPARK-38910
> Project: Spark
>  Issue Type: Task
>  Components: YARN
>Affects Versions: 3.2.1, 3.3.0
>Reporter: angerszhu
>Priority: Major
>
> {code:java}
>   ShutdownHookManager.addShutdownHook(priority) { () =>
> try {
>   val maxAppAttempts = client.getMaxRegAttempts(sparkConf, yarnConf)
>   val isLastAttempt = appAttemptId.getAttemptId() >= maxAppAttempts
>   if (!finished) {
> // The default state of ApplicationMaster is failed if it is 
> invoked by shut down hook.
> // This behavior is different compared to 1.x version.
> // If user application is exited ahead of time by calling 
> System.exit(N), here mark
> // this application as failed with EXIT_EARLY. For a good 
> shutdown, user shouldn't call
> // System.exit(0) to terminate the application.
> finish(finalStatus,
>   ApplicationMaster.EXIT_EARLY,
>   "Shutdown hook called before final status was reported.")
>   }
>   if (!unregistered) {
> // we only want to unregister if we don't want the RM to retry
> if (finalStatus == FinalApplicationStatus.SUCCEEDED || 
> isLastAttempt) {
>   unregister(finalStatus, finalMsg)
>   cleanupStagingDir(new 
> Path(System.getenv("SPARK_YARN_STAGING_DIR")))
> }
>   }
> } catch {
>   case e: Throwable =>
> logWarning("Ignoring Exception while stopping ApplicationMaster 
> from shutdown hook", e)
> }
>   }{code}
> unregister may throw exception, clean staging dir should before unregister.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38910) Clean sparkStaging dir should before unregister()

2022-04-15 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-38910:
--
Description: 
{code:java}

{code}


Not clean the staging dir when match case 
{code:jave}
!launcherBackend.isConnected() && fireAndForget
{code}

  was:

{code:java}
 def run(): Unit = {
submitApplication()
if (!launcherBackend.isConnected() && fireAndForget) {
  val report = getApplicationReport(appId)
  val state = report.getYarnApplicationState
  logInfo(s"Application report for $appId (state: $state)")
  logInfo(formatReportDetails(report, getDriverLogsLink(report)))
  if (state == YarnApplicationState.FAILED || state == 
YarnApplicationState.KILLED) {
throw new SparkException(s"Application $appId finished with status: 
$state")
  }
} else {
  val YarnAppReport(appState, finalState, diags) = monitorApplication(appId)
  if (appState == YarnApplicationState.FAILED || finalState == 
FinalApplicationStatus.FAILED) {
var amContainerSucceed = false
val amContainerExitMsg = s"AM Container for " +
  
s"${yarnClient.getApplicationReport(appId).getCurrentApplicationAttemptId} " +
  s"exited with  exitCode: 0"
diags.foreach { err =>
  logError(s"Application diagnostics message: $err")
  if (err.contains(amContainerExitMsg)) {
amContainerSucceed = true
  
{code}


Not clean the staging dir when match case 
{code:jave}
!launcherBackend.isConnected() && fireAndForget
{code}


> Clean sparkStaging dir should before unregister()
> -
>
> Key: SPARK-38910
> URL: https://issues.apache.org/jira/browse/SPARK-38910
> Project: Spark
>  Issue Type: Task
>  Components: YARN
>Affects Versions: 3.2.1, 3.3.0
>Reporter: angerszhu
>Priority: Major
>
> {code:java}
> {code}
> Not clean the staging dir when match case 
> {code:jave}
> !launcherBackend.isConnected() && fireAndForget
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38910) Clean sparkStaging dir should before unregister()

2022-04-15 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-38910:
--
Summary: Clean sparkStaging dir should before unregister()  (was: Clean 
sparkStaging dir should bef)

> Clean sparkStaging dir should before unregister()
> -
>
> Key: SPARK-38910
> URL: https://issues.apache.org/jira/browse/SPARK-38910
> Project: Spark
>  Issue Type: Task
>  Components: YARN
>Affects Versions: 3.2.1, 3.3.0
>Reporter: angerszhu
>Priority: Major
>
> {code:java}
>  def run(): Unit = {
> submitApplication()
> if (!launcherBackend.isConnected() && fireAndForget) {
>   val report = getApplicationReport(appId)
>   val state = report.getYarnApplicationState
>   logInfo(s"Application report for $appId (state: $state)")
>   logInfo(formatReportDetails(report, getDriverLogsLink(report)))
>   if (state == YarnApplicationState.FAILED || state == 
> YarnApplicationState.KILLED) {
> throw new SparkException(s"Application $appId finished with status: 
> $state")
>   }
> } else {
>   val YarnAppReport(appState, finalState, diags) = 
> monitorApplication(appId)
>   if (appState == YarnApplicationState.FAILED || finalState == 
> FinalApplicationStatus.FAILED) {
> var amContainerSucceed = false
> val amContainerExitMsg = s"AM Container for " +
>   
> s"${yarnClient.getApplicationReport(appId).getCurrentApplicationAttemptId} " +
>   s"exited with  exitCode: 0"
> diags.foreach { err =>
>   logError(s"Application diagnostics message: $err")
>   if (err.contains(amContainerExitMsg)) {
> amContainerSucceed = true
>   
> {code}
> Not clean the staging dir when match case 
> {code:jave}
> !launcherBackend.isConnected() && fireAndForget
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38910) Clean sparkStaging dir should bef

2022-04-15 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-38910:
--
Summary: Clean sparkStaging dir should bef  (was: Clean sparkStaging dir 
when WAIT_FOR_APP_COMPLETION is false too)

> Clean sparkStaging dir should bef
> -
>
> Key: SPARK-38910
> URL: https://issues.apache.org/jira/browse/SPARK-38910
> Project: Spark
>  Issue Type: Task
>  Components: YARN
>Affects Versions: 3.2.1, 3.3.0
>Reporter: angerszhu
>Priority: Major
>
> {code:java}
>  def run(): Unit = {
> submitApplication()
> if (!launcherBackend.isConnected() && fireAndForget) {
>   val report = getApplicationReport(appId)
>   val state = report.getYarnApplicationState
>   logInfo(s"Application report for $appId (state: $state)")
>   logInfo(formatReportDetails(report, getDriverLogsLink(report)))
>   if (state == YarnApplicationState.FAILED || state == 
> YarnApplicationState.KILLED) {
> throw new SparkException(s"Application $appId finished with status: 
> $state")
>   }
> } else {
>   val YarnAppReport(appState, finalState, diags) = 
> monitorApplication(appId)
>   if (appState == YarnApplicationState.FAILED || finalState == 
> FinalApplicationStatus.FAILED) {
> var amContainerSucceed = false
> val amContainerExitMsg = s"AM Container for " +
>   
> s"${yarnClient.getApplicationReport(appId).getCurrentApplicationAttemptId} " +
>   s"exited with  exitCode: 0"
> diags.foreach { err =>
>   logError(s"Application diagnostics message: $err")
>   if (err.contains(amContainerExitMsg)) {
> amContainerSucceed = true
>   
> {code}
> Not clean the staging dir when match case 
> {code:jave}
> !launcherBackend.isConnected() && fireAndForget
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38910) Clean sparkStaging dir when WAIT_FOR_APP_COMPLETION is false too

2022-04-14 Thread angerszhu (Jira)
angerszhu created SPARK-38910:
-

 Summary: Clean sparkStaging dir when WAIT_FOR_APP_COMPLETION is 
false too
 Key: SPARK-38910
 URL: https://issues.apache.org/jira/browse/SPARK-38910
 Project: Spark
  Issue Type: Task
  Components: YARN
Affects Versions: 3.2.1, 3.3.0
Reporter: angerszhu



{code:java}
 def run(): Unit = {
submitApplication()
if (!launcherBackend.isConnected() && fireAndForget) {
  val report = getApplicationReport(appId)
  val state = report.getYarnApplicationState
  logInfo(s"Application report for $appId (state: $state)")
  logInfo(formatReportDetails(report, getDriverLogsLink(report)))
  if (state == YarnApplicationState.FAILED || state == 
YarnApplicationState.KILLED) {
throw new SparkException(s"Application $appId finished with status: 
$state")
  }
} else {
  val YarnAppReport(appState, finalState, diags) = monitorApplication(appId)
  if (appState == YarnApplicationState.FAILED || finalState == 
FinalApplicationStatus.FAILED) {
var amContainerSucceed = false
val amContainerExitMsg = s"AM Container for " +
  
s"${yarnClient.getApplicationReport(appId).getCurrentApplicationAttemptId} " +
  s"exited with  exitCode: 0"
diags.foreach { err =>
  logError(s"Application diagnostics message: $err")
  if (err.contains(amContainerExitMsg)) {
amContainerSucceed = true
  
{code}


Not clean the staging dir when match case 
{code:jave}
!launcherBackend.isConnected() && fireAndForget
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38498) Support add StreamingListener by conf

2022-03-09 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-38498:
--
Description: 
Currently, if user want to add an customized StreamingListener to 
StreamingContext, we need to add this listener in customized code.
```
streamingContext.addStreamingListener()
```

we should support use conf to add this

> Support add StreamingListener by conf
> -
>
> Key: SPARK-38498
> URL: https://issues.apache.org/jira/browse/SPARK-38498
> Project: Spark
>  Issue Type: Task
>  Components: SQL, Structured Streaming
>Affects Versions: 3.2.1
>Reporter: angerszhu
>Priority: Major
>
> Currently, if user want to add an customized StreamingListener to 
> StreamingContext, we need to add this listener in customized code.
> ```
> streamingContext.addStreamingListener()
> ```
> we should support use conf to add this



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38498) Support add StreamingListener by conf

2022-03-09 Thread angerszhu (Jira)
angerszhu created SPARK-38498:
-

 Summary: Support add StreamingListener by conf
 Key: SPARK-38498
 URL: https://issues.apache.org/jira/browse/SPARK-38498
 Project: Spark
  Issue Type: Task
  Components: SQL, Structured Streaming
Affects Versions: 3.2.1
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38459) IsolatedClientLoader use build in hadoop version

2022-03-08 Thread angerszhu (Jira)
angerszhu created SPARK-38459:
-

 Summary: IsolatedClientLoader use build in hadoop version
 Key: SPARK-38459
 URL: https://issues.apache.org/jira/browse/SPARK-38459
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.1
Reporter: angerszhu


According to https://github.com/apache/spark/pull/34855#discussion_r822266139



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38449) Not call createTable when ifNotExist=true and table eixsts

2022-03-08 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-38449:
--
Description: 
In current code of v1, when table exist and ignoreTableExists = true, when 
table exists, will still call create table. It's not necessary


  was:In current code of v1, when table exist and ignoreTableExists = true, 
when table exists, will still call create table.


> Not call createTable when ifNotExist=true and table eixsts
> --
>
> Key: SPARK-38449
> URL: https://issues.apache.org/jira/browse/SPARK-38449
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: angerszhu
>Priority: Major
>
> In current code of v1, when table exist and ignoreTableExists = true, when 
> table exists, will still call create table. It's not necessary



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38449) Not call createTable when ifNotExist=true and table eixsts

2022-03-08 Thread angerszhu (Jira)
angerszhu created SPARK-38449:
-

 Summary: Not call createTable when ifNotExist=true and table eixsts
 Key: SPARK-38449
 URL: https://issues.apache.org/jira/browse/SPARK-38449
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.1
Reporter: angerszhu


In current code of v1, when table exist and ignoreTableExists = true, when 
table exists, will still call create table.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38382) Refactor migration guide's sentences

2022-03-01 Thread angerszhu (Jira)
angerszhu created SPARK-38382:
-

 Summary: Refactor migration guide's sentences
 Key: SPARK-38382
 URL: https://issues.apache.org/jira/browse/SPARK-38382
 Project: Spark
  Issue Type: Task
  Components: Documentation
Affects Versions: 3.2.1
Reporter: angerszhu


Current migration guide use Since spark x.x.x and In spark x.x.x, we should 
unify it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38358) Add migration guide for spark.sql.hive.convertMetastoreInsertDir and spark.sql.hive.convertMetastoreCtas

2022-02-28 Thread angerszhu (Jira)
angerszhu created SPARK-38358:
-

 Summary: Add migration guide for 
spark.sql.hive.convertMetastoreInsertDir and spark.sql.hive.convertMetastoreCtas
 Key: SPARK-38358
 URL: https://issues.apache.org/jira/browse/SPARK-38358
 Project: Spark
  Issue Type: Task
  Components: Documentation, SQL
Affects Versions: 3.2.1, 3.1.2, 3.0.3
Reporter: angerszhu


After we migration to spark3, many job throw exception since in data source 
API, we can't support overwrite into partition table while reading from this 
table. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38294) DDLUtils.verifyNotReadPath should check target is subDir

2022-02-22 Thread angerszhu (Jira)
angerszhu created SPARK-38294:
-

 Summary: DDLUtils.verifyNotReadPath should check target is subDir
 Key: SPARK-38294
 URL: https://issues.apache.org/jira/browse/SPARK-38294
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.1
Reporter: angerszhu


{code}
[info]   Cause: org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 0 in stage 14.0 failed 1 times, most recent failure: Lost task 
0.0 in stage 14.0 (TID 15) (10.12.190.176 executor driver): 
org.apache.spark.SparkException: Task failed while writing rows.
[info]  at 
org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:577)
[info]  at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:345)
[info]  at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$20(FileFormatWriter.scala:252)
[info]  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
[info]  at org.apache.spark.scheduler.Task.run(Task.scala:136)
[info]  at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:507)
[info]  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1475)
[info]  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:510)
[info]  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]  at java.lang.Thread.run(Thread.java:748)
[info] Caused by: java.io.FileNotFoundException:
[info] File 
file:/Users/yi.zhu/Documents/project/Angerszh/spark/target/tmp/spark-f1c6b035-e585-4c0e-9b83-17ad54e85978/dt=2020-09-10/part-0-855b7af4-fe2b-4933-807a-6bf40eab11ba.c000.snappy.parquet
 does not exist
[info]
[info] It is possible the underlying files have been updated. You can 
explicitly invalidate
[info] the cache in Spark by running 'REFRESH TABLE tableName' command in SQL 
or by
[info] recreating the Dataset/DataFrame involved.
[info]
[info]  at 
org.apache.spark.sql.errors.QueryExecutionErrors$.readCurrentFileNotFoundError(QueryExecutionErrors.scala:583)
[info]  at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:212)
[info]  at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:270)
[info]  at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116)
[info]  at 
org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:548)
[info]  at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
 Source)
[info]  at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
[info]  at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
[info]  at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
[info]  at 
org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:91)
[info]  at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:328)
[info]  at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1509)
[info]  at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:335)
[info]  ... 9 more
[info]
[info] Driver stacktrace:
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38270) SQL CLI AM should keep same exitcode with client

2022-02-22 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-38270:
--
Parent: SPARK-36623
Issue Type: Sub-task  (was: Task)

> SQL CLI AM should keep same exitcode with client
> 
>
> Key: SPARK-38270
> URL: https://issues.apache.org/jira/browse/SPARK-38270
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: angerszhu
>Priority: Major
>
> Currently for SQL CLI, we all use  shutdown hook to stop SC
> {code:java}
> // Clean up after we exit
> ShutdownHookManager.addShutdownHook { () => SparkSQLEnv.stop() }
> {code}
> This cause Yarn AM always success even client exit with code not 0.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38293) Fix flaky text of HealthTrackerIntegrationSuite

2022-02-22 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496422#comment-17496422
 ] 

angerszhu commented on SPARK-38293:
---

cc [~dongjoon] [~hyukjin.kwon] Have meet this for some times. 

> Fix flaky text of  HealthTrackerIntegrationSuite
> 
>
> Key: SPARK-38293
> URL: https://issues.apache.org/jira/browse/SPARK-38293
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: angerszhu
>Priority: Major
>
> {code:java}
> [info] HealthTrackerIntegrationSuite:
> [info] - If preferred node is bad, without excludeOnFailure job will fail 
> (120 milliseconds)
> [info] - With default settings, job can succeed despite multiple bad 
> executors on node (3 seconds, 78 milliseconds)
> [info] - Bad node with multiple executors, job will still succeed with the 
> right confs *** FAILED *** (61 milliseconds)
> [info]   Map() did not equal Map(0 -> 42, 5 -> 42, 1 -> 42, 6 -> 42, 9 -> 42, 
> 2 -> 42, 7 -> 42, 3 -> 42, 8 -> 42, 4 -> 42) 
> (HealthTrackerIntegrationSuite.scala:94)
> [info]   Analysis:
> [info]   HashMap(0: -> 42, 1: -> 42, 2: -> 42, 3: -> 42, 4: -> 42, 5: -> 42, 
> 6: -> 42, 7: -> 42, 8: -> 42, 9: -> 42)
> [info]   org.scalatest.exceptions.TestFailedException:
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> [info]   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
> [info]   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38293) Fix flaky text of HealthTrackerIntegrationSuite

2022-02-22 Thread angerszhu (Jira)
angerszhu created SPARK-38293:
-

 Summary: Fix flaky text of  HealthTrackerIntegrationSuite
 Key: SPARK-38293
 URL: https://issues.apache.org/jira/browse/SPARK-38293
 Project: Spark
  Issue Type: Task
  Components: Spark Core
Affects Versions: 3.2.1
Reporter: angerszhu


{code:java}
[info] HealthTrackerIntegrationSuite:
[info] - If preferred node is bad, without excludeOnFailure job will fail (120 
milliseconds)
[info] - With default settings, job can succeed despite multiple bad executors 
on node (3 seconds, 78 milliseconds)
[info] - Bad node with multiple executors, job will still succeed with the 
right confs *** FAILED *** (61 milliseconds)
[info]   Map() did not equal Map(0 -> 42, 5 -> 42, 1 -> 42, 6 -> 42, 9 -> 42, 2 
-> 42, 7 -> 42, 3 -> 42, 8 -> 42, 4 -> 42) 
(HealthTrackerIntegrationSuite.scala:94)
[info]   Analysis:
[info]   HashMap(0: -> 42, 1: -> 42, 2: -> 42, 3: -> 42, 4: -> 42, 5: -> 42, 6: 
-> 42, 7: -> 42, 8: -> 42, 9: -> 42)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at 
org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
[info]   at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)

{code}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38289) Refactor SQL CLI exit code related code

2022-02-22 Thread angerszhu (Jira)
angerszhu created SPARK-38289:
-

 Summary: Refactor SQL CLI exit code related code
 Key: SPARK-38289
 URL: https://issues.apache.org/jira/browse/SPARK-38289
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.1
Reporter: angerszhu


Refactor SQL CLI exit code related code



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38270) SQL CLI AM should keep same exitcode with client

2022-02-20 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-38270:
--
Description: 
Currently for SQL CLI, we all use  shutdown hook to stop SC

{code:java}
// Clean up after we exit
ShutdownHookManager.addShutdownHook { () => SparkSQLEnv.stop() }

{code}
This cause Yarn AM always success even client exit with code not 0.


> SQL CLI AM should keep same exitcode with client
> 
>
> Key: SPARK-38270
> URL: https://issues.apache.org/jira/browse/SPARK-38270
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: angerszhu
>Priority: Major
>
> Currently for SQL CLI, we all use  shutdown hook to stop SC
> {code:java}
> // Clean up after we exit
> ShutdownHookManager.addShutdownHook { () => SparkSQLEnv.stop() }
> {code}
> This cause Yarn AM always success even client exit with code not 0.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38270) SQL CLI AM should keep same exitcode with client

2022-02-20 Thread angerszhu (Jira)
angerszhu created SPARK-38270:
-

 Summary: SQL CLI AM should keep same exitcode with client
 Key: SPARK-38270
 URL: https://issues.apache.org/jira/browse/SPARK-38270
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.1
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38215) InsertIntoHiveDir support convert metadata

2022-02-15 Thread angerszhu (Jira)
angerszhu created SPARK-38215:
-

 Summary: InsertIntoHiveDir support convert metadata
 Key: SPARK-38215
 URL: https://issues.apache.org/jira/browse/SPARK-38215
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.1
Reporter: angerszhu


Current InsertIntoHiveDir command use hive serde write data, con't supporot 
convert, cause such SQL can't write  parquet with zstd.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38197) Improve error message of BlockManager.fetchRemoteManagedBuffer

2022-02-13 Thread angerszhu (Jira)
angerszhu created SPARK-38197:
-

 Summary: Improve error message of 
BlockManager.fetchRemoteManagedBuffer
 Key: SPARK-38197
 URL: https://issues.apache.org/jira/browse/SPARK-38197
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.1
Reporter: angerszhu
 Fix For: 3.3.0


Some fetch failed message not show fetch information.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38150) Update comment of RelationConversions

2022-02-08 Thread angerszhu (Jira)
angerszhu created SPARK-38150:
-

 Summary: Update comment of RelationConversions
 Key: SPARK-38150
 URL: https://issues.apache.org/jira/browse/SPARK-38150
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.1, 3.2.0
Reporter: angerszhu
 Fix For: 3.3.0


Current comment of RelationConversions is not correct



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35531) Can not insert into hive bucket table if create table with upper case schema

2022-02-07 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17488612#comment-17488612
 ] 

angerszhu commented on SPARK-35531:
---

Sure.

> Can not insert into hive bucket table if create table with upper case schema
> 
>
> Key: SPARK-35531
> URL: https://issues.apache.org/jira/browse/SPARK-35531
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.1, 3.2.0
>Reporter: Hongyi Zhang
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.3.0
>
>
>  
>  
> create table TEST1(
>  V1 BIGINT,
>  S1 INT)
>  partitioned by (PK BIGINT)
>  clustered by (V1)
>  sorted by (S1)
>  into 200 buckets
>  STORED AS PARQUET;
>  
> insert into test1
>  select
>  * from values(1,1,1);
>  
>  
> org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not 
> part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), 
> FieldSchema(name:s1, type:int, comment:null)]
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not 
> part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), 
> FieldSchema(name:s1, type:int, comment:null)]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38043) Refactor FileDataSourceBaseSuite

2022-01-27 Thread angerszhu (Jira)
angerszhu created SPARK-38043:
-

 Summary: Refactor FileDataSourceBaseSuite
 Key: SPARK-38043
 URL: https://issues.apache.org/jira/browse/SPARK-38043
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.0, 3.3.0
Reporter: angerszhu


Refactor FileDataSourceBaseSuite to build a test frame for datasource



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37985) Fix flaky test SPARK-37578

2022-01-21 Thread angerszhu (Jira)
angerszhu created SPARK-37985:
-

 Summary: Fix flaky test SPARK-37578
 Key: SPARK-37985
 URL: https://issues.apache.org/jira/browse/SPARK-37985
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.0
Reporter: angerszhu


2022-01-22T01:58:29.8444339Z [info] - SPARK-36030: 
Report metrics from Datasource v2 write (90 milliseconds)
2022-01-22T01:58:29.9427049Z [info] - SPARK-37578: 
Update output metrics from Datasource v2 *** FAILED *** (65 
milliseconds)
2022-01-22T01:58:29.9428038Z [info]   123 did not 
equal 246 (SQLAppStatusListenerSuite.scala:936)
2022-01-22T01:58:29.9428531Z [info]   
org.scalatest.exceptions.TestFailedException:
2022-01-22T01:58:29.9429101Z [info]   at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
2022-01-22T01:58:29.9429717Z [info]   at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
2022-01-22T01:58:29.9430298Z [info]   at 
org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
2022-01-22T01:58:29.9430840Z [info]   at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
2022-01-22T01:58:29.9431512Z [info]   at 
org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$61(SQLAppStatusListenerSuite.scala:936)
2022-01-22T01:58:29.9432305Z [info]   at 
org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$61$adapted(SQLAppStatusListenerSuite.scala:905)
2022-01-22T01:58:29.9432982Z [info]   at 
org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1(SQLTestUtils.scala:79)
2022-01-22T01:58:29.9433695Z [info]   at 
org.apache.spark.sql.test.SQLTestUtils.$anonfun$withTempDir$1$adapted(SQLTestUtils.scala:78)
2022-01-22T01:58:29.9434276Z [info]   at 
org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:221)
2022-01-22T01:58:29.9435040Z [info]   at 
org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.org$apache$spark$sql$test$SQLTestUtils$$super$withTempDir(SQLAppStatusListenerSuite.scala:63)
2022-01-22T01:58:29.9435764Z [info]   at 
org.apache.spark.sql.test.SQLTestUtils.withTempDir(SQLTestUtils.scala:78)
2022-01-22T01:58:29.9436354Z [info]   at 
org.apache.spark.sql.test.SQLTestUtils.withTempDir$(SQLTestUtils.scala:77)
2022-01-22T01:58:29.9437063Z [info]   at 
org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.withTempDir(SQLAppStatusListenerSuite.scala:63)
2022-01-22T01:58:29.9437851Z [info]   at 
org.apache.spark.sql.execution.ui.SQLAppStatusListenerSuite.$anonfun$new$60(SQLAppStatusListenerSuite.scala:905)




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37969) Hive Serde insert should check schema before execution

2022-01-20 Thread angerszhu (Jira)
angerszhu created SPARK-37969:
-

 Summary: Hive Serde insert should check schema before execution
 Key: SPARK-37969
 URL: https://issues.apache.org/jira/browse/SPARK-37969
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.0
Reporter: angerszhu



{code:java}
[info]   Cause: org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 
in stage 0.0 (TID 0) (10.12.188.15 executor driver): 
java.lang.IllegalArgumentException: Error: : expected at the position 19 of 
'struct' but '(' is found.
[info]  at 
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:384)
[info]  at 
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:355)
[info]  at 
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:507)
[info]  at 
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:329)
[info]  at 
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:814)
[info]  at 
org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:112)
[info]  at 
org.apache.spark.sql.hive.execution.HiveOutputWriter.(HiveFileFormat.scala:122)
[info]  at 
org.apache.spark.sql.hive.execution.HiveFileFormat$$anon$1.newInstance(HiveFileFormat.scala:105)
[info]  at 
org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:161)
[info]  at 
org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:146)
[info]  at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:313)
[info]  at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$20(FileFormatWriter.scala:252)
[info]  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
[info]  at org.apache.spark.scheduler.Task.run(Task.scala:136)
[info]  at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:507)
[info]  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1475)
[info]  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:510)
[info]  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]  at java.lang.Thread.run(Thread.java:748)
[info]







[info]   Cause: java.lang.IllegalArgumentException: field ended by ';': 
expected ';' but got 'IF' at line 2:   optional int32 (IF
[info]   at 
org.apache.parquet.schema.MessageTypeParser.check(MessageTypeParser.java:239)
[info]   at 
org.apache.parquet.schema.MessageTypeParser.addPrimitiveType(MessageTypeParser.java:208)
[info]   at 
org.apache.parquet.schema.MessageTypeParser.addType(MessageTypeParser.java:113)
[info]   at 
org.apache.parquet.schema.MessageTypeParser.addGroupTypeFields(MessageTypeParser.java:101)
[info]   at 
org.apache.parquet.schema.MessageTypeParser.parse(MessageTypeParser.java:94)
[info]   at 
org.apache.parquet.schema.MessageTypeParser.parseMessageType(MessageTypeParser.java:84)
[info]   at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.getSchema(DataWritableWriteSupport.java:43)
[info]   at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.init(DataWritableWriteSupport.java:48)
[info]   at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:476)
[info]   at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:430)
[info]   at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:425)
[info]   at 
org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.(ParquetRecordWriterWrapper.java:70)
[info]   at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getParquerRecordWriterWrapper(MapredParquetOutputFormat.java:137)
[info]   at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getHiveRecordWriter(MapredParquetOutputFormat.java:126)
[info]   at 
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:286)
[info]   at 
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:271)
[info]   at 
org.apache.spark.sql.hive.execution.HiveOutputWriter.(HiveFileFormat.scala:132)
[info]   at 
org.apache.spark.sql.hive.execution.HiveFileFormat$$anon$1.newInstance(HiveFileFormat.scala:105)
[info]   at 
org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:161)
[info]   at 

[jira] [Updated] (SPARK-37967) ConstantFolding/ Literal.create support ObjectType

2022-01-19 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-37967:
--
Summary: ConstantFolding/ Literal.create support ObjectType  (was: Literal 
support ObjectType)

> ConstantFolding/ Literal.create support ObjectType
> --
>
> Key: SPARK-37967
> URL: https://issues.apache.org/jira/browse/SPARK-37967
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37967) Literal support ObjectType

2022-01-19 Thread angerszhu (Jira)
angerszhu created SPARK-37967:
-

 Summary: Literal support ObjectType
 Key: SPARK-37967
 URL: https://issues.apache.org/jira/browse/SPARK-37967
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >