[jira] [Commented] (SPARK-35531) Can not insert into hive bucket table if create table with upper case schema

2022-02-07 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488612#comment-17488612
 ] 

angerszhu commented on SPARK-35531:
---

Sure.

> Can not insert into hive bucket table if create table with upper case schema
> 
>
> Key: SPARK-35531
> URL: https://issues.apache.org/jira/browse/SPARK-35531
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.1, 3.2.0
>Reporter: Hongyi Zhang
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.3.0
>
>
>  
>  
> create table TEST1(
>  V1 BIGINT,
>  S1 INT)
>  partitioned by (PK BIGINT)
>  clustered by (V1)
>  sorted by (S1)
>  into 200 buckets
>  STORED AS PARQUET;
>  
> insert into test1
>  select
>  * from values(1,1,1);
>  
>  
> org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not 
> part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), 
> FieldSchema(name:s1, type:int, comment:null)]
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not 
> part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), 
> FieldSchema(name:s1, type:int, comment:null)]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38097) Improve the error for pivoting of unsupported value types

2022-02-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488614#comment-17488614
 ] 

Apache Spark commented on SPARK-38097:
--

User 'yutoacts' has created a pull request for this issue:
https://github.com/apache/spark/pull/35433

> Improve the error for pivoting of unsupported value types
> -
>
> Key: SPARK-38097
> URL: https://issues.apache.org/jira/browse/SPARK-38097
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> The error message from:
> {code:scala}
>   test("Improve the error for pivoting of unsupported value types") {
> trainingSales
>   .groupBy($"sales.year")
>   .pivot(struct(lower($"sales.course"), $"training"))
>   .agg(sum($"sales.earnings"))
>   .show(false)
>   }
> {code}
> can confuse users:
> {code:java}
> The feature is not supported: literal for '[dotnet,Dummies]' of class 
> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema.
> org.apache.spark.SparkRuntimeException: The feature is not supported: literal 
> for '[dotnet,Dummies]' of class 
> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema.
>   at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.literalTypeUnsupportedError(QueryExecutionErrors.scala:245)
>   at 
> org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:99)
>   at 
> org.apache.spark.sql.RelationalGroupedDataset.$anonfun$pivot$2(RelationalGroupedDataset.scala:455)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
> {code}
> Need to improve the error message and make it more precise.
> See https://github.com/apache/spark/pull/35302#discussion_r793629370



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38135) Introduce `spark.kubernetes.job` sheduling related configurations

2022-02-07 Thread Yikun Jiang (Jira)
Yikun Jiang created SPARK-38135:
---

 Summary: Introduce `spark.kubernetes.job`  sheduling related 
configurations
 Key: SPARK-38135
 URL: https://issues.apache.org/jira/browse/SPARK-38135
 Project: Spark
  Issue Type: Sub-task
  Components: Kubernetes
Affects Versions: 3.3.0
Reporter: Yikun Jiang


spark.kubernetes.job.minCPU:  the minimum cpu resources for running
spark.kubernetes.job.minMemory:  the minimum memory resources for running
spark.kubernetes.job.minMember: the minimum number of pods for running
spark.kubernetes.job.priorityClassName: the priority of the running job
spark.kubernetes.job.queue: the queue to which the running job belongs



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38135) Introduce `spark.kubernetes.job` sheduling related configurations

2022-02-07 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang updated SPARK-38135:

Description: 
spark.kubernetes.job.minCPU:  the minimum cpu resources for running job
spark.kubernetes.job.minMemory:  the minimum memory resources for running job
spark.kubernetes.job.minMember: the minimum number of pods for running job
spark.kubernetes.job.priorityClassName: the priority of the running job
spark.kubernetes.job.queue: the queue to which the running job belongs

  was:
spark.kubernetes.job.minCPU:  the minimum cpu resources for running
spark.kubernetes.job.minMemory:  the minimum memory resources for running
spark.kubernetes.job.minMember: the minimum number of pods for running
spark.kubernetes.job.priorityClassName: the priority of the running job
spark.kubernetes.job.queue: the queue to which the running job belongs


> Introduce `spark.kubernetes.job`  sheduling related configurations
> --
>
> Key: SPARK-38135
> URL: https://issues.apache.org/jira/browse/SPARK-38135
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Priority: Major
>
> spark.kubernetes.job.minCPU:  the minimum cpu resources for running job
> spark.kubernetes.job.minMemory:  the minimum memory resources for running job
> spark.kubernetes.job.minMember: the minimum number of pods for running job
> spark.kubernetes.job.priorityClassName: the priority of the running job
> spark.kubernetes.job.queue: the queue to which the running job belongs



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38124) Revive HashClusteredDistribution and apply to stream-stream join

2022-02-07 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-38124:
-
Summary: Revive HashClusteredDistribution and apply to stream-stream join  
(was: Revive HashClusteredDistribution and apply to all stateful operators)

> Revive HashClusteredDistribution and apply to stream-stream join
> 
>
> Key: SPARK-38124
> URL: https://issues.apache.org/jira/browse/SPARK-38124
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Jungtaek Lim
>Priority: Blocker
>  Labels: correctness
>
> SPARK-35703 removed HashClusteredDistribution and replaced its usages with 
> ClusteredDistribution.
> While this works great for non stateful operators, we still need to have a 
> separate requirement of distribution for stateful operator, because the 
> requirement of ClusteredDistribution is too relaxed while the requirement of 
> physical partitioning on stateful operator is quite strict.
> In most cases, stateful operators must require child distribution as 
> HashClusteredDistribution, with below major assumptions:
>  # HashClusteredDistribution creates HashPartitioning and we will never ever 
> change it for the future.
>  # We will never ever change the implementation of {{partitionIdExpression}} 
> in HashPartitioning for the future, so that Partitioner will behave 
> consistently across Spark versions.
>  # No partitioning except HashPartitioning can satisfy 
> HashClusteredDistribution.
>  
> We should revive HashClusteredDistribution (with probably renaming 
> specifically with stateful operator) and apply the distribution to the all 
> stateful operators.
> SPARK-35703 only touched stream-stream join, which means other stateful 
> operators already used ClusteredDistribution, hence have been broken for a 
> long time.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38124) Revive HashClusteredDistribution and apply to stream-stream join

2022-02-07 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-38124:
-
Labels:   (was: correctness)

> Revive HashClusteredDistribution and apply to stream-stream join
> 
>
> Key: SPARK-38124
> URL: https://issues.apache.org/jira/browse/SPARK-38124
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Jungtaek Lim
>Priority: Blocker
>
> SPARK-35703 removed HashClusteredDistribution and replaced its usages with 
> ClusteredDistribution.
> While this works great for non stateful operators, we still need to have a 
> separate requirement of distribution for stateful operator, because the 
> requirement of ClusteredDistribution is too relaxed while the requirement of 
> physical partitioning on stateful operator is quite strict.
> In most cases, stateful operators must require child distribution as 
> HashClusteredDistribution, with below major assumptions:
>  # HashClusteredDistribution creates HashPartitioning and we will never ever 
> change it for the future.
>  # We will never ever change the implementation of {{partitionIdExpression}} 
> in HashPartitioning for the future, so that Partitioner will behave 
> consistently across Spark versions.
>  # No partitioning except HashPartitioning can satisfy 
> HashClusteredDistribution.
>  
> We should revive HashClusteredDistribution (with probably renaming 
> specifically with stateful operator) and apply the distribution to the all 
> stateful operators.
> SPARK-35703 only touched stream-stream join, which means stream-stream join 
> hasn't been broken in actual releases. Let's aim the partial revert of 
> SPARK-35703 in this ticket, and have another ticket to deal with other 
> stateful operators, which have been broken for their introduction (2.2+).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38124) Revive HashClusteredDistribution and apply to stream-stream join

2022-02-07 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-38124:
-
Description: 
SPARK-35703 removed HashClusteredDistribution and replaced its usages with 
ClusteredDistribution.

While this works great for non stateful operators, we still need to have a 
separate requirement of distribution for stateful operator, because the 
requirement of ClusteredDistribution is too relaxed while the requirement of 
physical partitioning on stateful operator is quite strict.

In most cases, stateful operators must require child distribution as 
HashClusteredDistribution, with below major assumptions:
 # HashClusteredDistribution creates HashPartitioning and we will never ever 
change it for the future.
 # We will never ever change the implementation of {{partitionIdExpression}} in 
HashPartitioning for the future, so that Partitioner will behave consistently 
across Spark versions.
 # No partitioning except HashPartitioning can satisfy 
HashClusteredDistribution.

 

We should revive HashClusteredDistribution (with probably renaming specifically 
with stateful operator) and apply the distribution to the all stateful 
operators.

SPARK-35703 only touched stream-stream join, which means stream-stream join 
hasn't been broken in actual releases. Let's aim the partial revert of 
SPARK-35703 in this ticket, and have another ticket to deal with other stateful 
operators, which have been broken for their introduction (2.2+).

  was:
SPARK-35703 removed HashClusteredDistribution and replaced its usages with 
ClusteredDistribution.

While this works great for non stateful operators, we still need to have a 
separate requirement of distribution for stateful operator, because the 
requirement of ClusteredDistribution is too relaxed while the requirement of 
physical partitioning on stateful operator is quite strict.

In most cases, stateful operators must require child distribution as 
HashClusteredDistribution, with below major assumptions:
 # HashClusteredDistribution creates HashPartitioning and we will never ever 
change it for the future.
 # We will never ever change the implementation of {{partitionIdExpression}} in 
HashPartitioning for the future, so that Partitioner will behave consistently 
across Spark versions.
 # No partitioning except HashPartitioning can satisfy 
HashClusteredDistribution.

 

We should revive HashClusteredDistribution (with probably renaming specifically 
with stateful operator) and apply the distribution to the all stateful 
operators.

SPARK-35703 only touched stream-stream join, which means other stateful 
operators already used ClusteredDistribution, hence have been broken for a long 
time.


> Revive HashClusteredDistribution and apply to stream-stream join
> 
>
> Key: SPARK-38124
> URL: https://issues.apache.org/jira/browse/SPARK-38124
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Jungtaek Lim
>Priority: Blocker
>  Labels: correctness
>
> SPARK-35703 removed HashClusteredDistribution and replaced its usages with 
> ClusteredDistribution.
> While this works great for non stateful operators, we still need to have a 
> separate requirement of distribution for stateful operator, because the 
> requirement of ClusteredDistribution is too relaxed while the requirement of 
> physical partitioning on stateful operator is quite strict.
> In most cases, stateful operators must require child distribution as 
> HashClusteredDistribution, with below major assumptions:
>  # HashClusteredDistribution creates HashPartitioning and we will never ever 
> change it for the future.
>  # We will never ever change the implementation of {{partitionIdExpression}} 
> in HashPartitioning for the future, so that Partitioner will behave 
> consistently across Spark versions.
>  # No partitioning except HashPartitioning can satisfy 
> HashClusteredDistribution.
>  
> We should revive HashClusteredDistribution (with probably renaming 
> specifically with stateful operator) and apply the distribution to the all 
> stateful operators.
> SPARK-35703 only touched stream-stream join, which means stream-stream join 
> hasn't been broken in actual releases. Let's aim the partial revert of 
> SPARK-35703 in this ticket, and have another ticket to deal with other 
> stateful operators, which have been broken for their introduction (2.2+).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38136) Update GitHub Action test image

2022-02-07 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-38136:
-

 Summary: Update GitHub Action test image
 Key: SPARK-38136
 URL: https://issues.apache.org/jira/browse/SPARK-38136
 Project: Spark
  Issue Type: Test
  Components: Project Infra, Tests
Affects Versions: 3.3.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38136) Update GitHub Action test image

2022-02-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488638#comment-17488638
 ] 

Apache Spark commented on SPARK-38136:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/35434

> Update GitHub Action test image
> ---
>
> Key: SPARK-38136
> URL: https://issues.apache.org/jira/browse/SPARK-38136
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38136) Update GitHub Action test image

2022-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38136:


Assignee: Apache Spark

> Update GitHub Action test image
> ---
>
> Key: SPARK-38136
> URL: https://issues.apache.org/jira/browse/SPARK-38136
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38136) Update GitHub Action test image

2022-02-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38136:


Assignee: (was: Apache Spark)

> Update GitHub Action test image
> ---
>
> Key: SPARK-38136
> URL: https://issues.apache.org/jira/browse/SPARK-38136
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37500) Incorrect scope when using named_windows in CTEs

2022-02-07 Thread Lauri Koobas (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488649#comment-17488649
 ] 

Lauri Koobas commented on SPARK-37500:
--

This particular case is fixed indeed, but I believe it was fixed in a bad way.

This works: (notice the JOIN is commented out):
{noformat}
create or replace temporary view test_temp_view as 
with step_1 as (
  select *
  , min(a) over w2 as min_a_over_w1
  from (select 1 as a, 2 as b, 3 as c)
  window w2 as (partition by b order by c)
)
, step_2 as (
  select *
  from (select 1 as e, 2 as f, 3 as g)
  --join step_1 on true
  window w1 as (partition by f order by g)
)
select *
from step_2
{noformat}
This doesn't:
{noformat}
create or replace temporary view test_temp_view as 
with step_1 as (
  select *
  , min(a) over w2 as min_a_over_w1
  from (select 1 as a, 2 as b, 3 as c)
  window w2 as (partition by b order by c)
)
, step_2 as (
  select *
  from (select 1 as e, 2 as f, 3 as g)
  join step_1 on true
  window w1 as (partition by f order by g)
)
select *
from step_2
{noformat}
It seems that JOIN-ng one CTE with a named window to another CTE with a named 
window will overwrite/clear some scope.

Somehow it also ONLY applies when creating a view, but not when just running 
the query.

> Incorrect scope when using named_windows in CTEs
> 
>
> Key: SPARK-37500
> URL: https://issues.apache.org/jira/browse/SPARK-37500
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.0
> Environment: Databricks Runtime 9.0, 9.1, 10.0
>Reporter: Lauri Koobas
>Priority: Major
>
> This works, but shouldn't. The named_window is described outside the CTE that 
> uses it.
> {code:sql}
> with step_1 as (
>   select *
>   , min(a) over w1 as min_a_over_w1
>   from (select 1 as a, 2 as b, 3 as c)
> )
> select *
> from step_1
> window w1 as (partition by b order by c)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38033) The structured streaming processing cannot be started because the commitId and offsetId are inconsistent

2022-02-07 Thread LeeeeLiu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488650#comment-17488650
 ] 

LLiu commented on SPARK-38033:
--

Hi [~kabhwan] , this is a good idea(y), thanks for your suggestion, and I'll 
try to provide better error message.

> The structured streaming processing cannot be started because the commitId 
> and offsetId are inconsistent
> 
>
> Key: SPARK-38033
> URL: https://issues.apache.org/jira/browse/SPARK-38033
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.6, 3.0.3
>Reporter: LLiu
>Priority: Major
>
> Streaming Processing could not start due to an unexpected machine shutdown.
> The exception is as follows
>  
> {code:java}
> ERROR 22/01/12 02:48:36 MicroBatchExecution: Query 
> streaming_4a026335eafd4bb498ee51752b49f7fb [id = 
> 647ba9e4-16d2-4972-9824-6f9179588806, runId = 
> 92385d5b-f31f-40d0-9ac7-bb7d9796d774] terminated with error
> java.lang.IllegalStateException: batch 113258 doesn't exist
>         at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$4.apply(MicroBatchExecution.scala:256)
>         at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$4.apply(MicroBatchExecution.scala:256)
>         at scala.Option.getOrElse(Option.scala:121)
>         at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets(MicroBatchExecution.scala:255)
>         at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:169)
>         at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
>         at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
>         at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
>         at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
>         at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166)
>         at 
> org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
>         at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
>         at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:281)
>         at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
> {code}
> I checked checkpoint file on HDFS and found the latest offset is 113259. But 
> commits is 113257. The following
> {code:java}
> commits
> /tmp/streaming_/commits/113253
> /tmp/streaming_/commits/113254
> /tmp/streaming_/commits/113255
> /tmp/streaming_/commits/113256
> /tmp/streaming_/commits/113257
> offset
> /tmp/streaming_/offsets/113253
> /tmp/streaming_/offsets/113254
> /tmp/streaming_/offsets/113255
> /tmp/streaming_/offsets/113256
> /tmp/streaming_/offsets/113257
> /tmp/streaming_/offsets/113259{code}
> Finally, I deleted offsets “/tmp/streaming_/offsets/113259” and the 
> program started normally. I think there is a problem here and we should try 
> to handle this exception or give some resolution in the log.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2