[jira] [Commented] (SPARK-23886) update query.status

2018-11-15 Thread Efim Poberezkin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16688509#comment-16688509
 ] 

Efim Poberezkin commented on SPARK-23886:
-

[~gsomogyi] sure, feel free to take this over, I don't plan to work on it any 
time soon.

> update query.status
> ---
>
> Key: SPARK-23886
> URL: https://issues.apache.org/jira/browse/SPARK-23886
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24063) Control maximum epoch backlog

2018-04-24 Thread Efim Poberezkin (JIRA)
Efim Poberezkin created SPARK-24063:
---

 Summary: Control maximum epoch backlog
 Key: SPARK-24063
 URL: https://issues.apache.org/jira/browse/SPARK-24063
 Project: Spark
  Issue Type: Sub-task
  Components: Structured Streaming
Affects Versions: 2.4.0
Reporter: Efim Poberezkin


As pointed out by [~joseph.torres] in 
[https://github.com/apache/spark/pull/20936], both epoch queue and 
commits/offsets maps are unbounded by the number of waiting epochs. According 
to his proposal, we should introduce some configuration option for maximum 
epoch backlog and report an error if the number of waiting epochs exceeds it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23886) update query.status

2018-04-09 Thread Efim Poberezkin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16430267#comment-16430267
 ] 

Efim Poberezkin commented on SPARK-23886:
-

I'd like to work on this issue if nobody's working on it yet

> update query.status
> ---
>
> Key: SPARK-23886
> URL: https://issues.apache.org/jira/browse/SPARK-23886
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23747) Add EpochCoordinator unit tests

2018-04-02 Thread Efim Poberezkin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422814#comment-16422814
 ] 

Efim Poberezkin commented on SPARK-23747:
-

Okay, will do, thank you for clarification.

> Add EpochCoordinator unit tests
> ---
>
> Key: SPARK-23747
> URL: https://issues.apache.org/jira/browse/SPARK-23747
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23747) Add EpochCoordinator unit tests

2018-04-02 Thread Efim Poberezkin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422020#comment-16422020
 ] 

Efim Poberezkin commented on SPARK-23747:
-

Hello Jose, I'd like to continue working on Continuous Processing and I think I 
could work on this issue if you're alright with it. Could you elaborate 
what/how tests do you think should be implemented for EpochCoordinator please? 
Should they include testing of integration with some other components or only 
internal logic? Also should the changes I made in SPARK-23503 be tested until 
they are merged?

> Add EpochCoordinator unit tests
> ---
>
> Key: SPARK-23747
> URL: https://issues.apache.org/jira/browse/SPARK-23747
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23503) continuous execution should sequence committed epochs

2018-03-29 Thread Efim Poberezkin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419136#comment-16419136
 ] 

Efim Poberezkin commented on SPARK-23503:
-

[~joseph.torres] Good day Jose. From what I've figured about Continuous 
Execution implementation an epoch coordinator is created per streaming query 
run and is able to store state. I've added tracking of last committed epoch and 
of waiting epochs to it to enforce epoch sequencing. Could you please correct 
me if my understanding of your implementation is not correct and take a look at 
my approach?

> continuous execution should sequence committed epochs
> -
>
> Key: SPARK-23503
> URL: https://issues.apache.org/jira/browse/SPARK-23503
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Priority: Major
>
> Currently, the EpochCoordinator doesn't enforce a commit order. If a message 
> for epoch n gets lost in the ether, and epoch n + 1 happens to be ready for 
> commit earlier, epoch n + 1 will be committed.
>  
> This is either incorrect or needlessly confusing, because it's not safe to 
> start from the end offset of epoch n + 1 until epoch n is committed. 
> EpochCoordinator should enforce this sequencing.
>  
> Note that this is not actually a problem right now, because the commit 
> messages go through the same RPC channel from the same place. But we 
> shouldn't implicitly bake this assumption in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23503) continuous execution should sequence committed epochs

2018-03-21 Thread Efim Poberezkin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407545#comment-16407545
 ] 

Efim Poberezkin commented on SPARK-23503:
-

I'd like to work on this one

> continuous execution should sequence committed epochs
> -
>
> Key: SPARK-23503
> URL: https://issues.apache.org/jira/browse/SPARK-23503
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Priority: Major
>
> Currently, the EpochCoordinator doesn't enforce a commit order. If a message 
> for epoch n gets lost in the ether, and epoch n + 1 happens to be ready for 
> commit earlier, epoch n + 1 will be committed.
>  
> This is either incorrect or needlessly confusing, because it's not safe to 
> start from the end offset of epoch n + 1 until epoch n is committed. 
> EpochCoordinator should enforce this sequencing.
>  
> Note that this is not actually a problem right now, because the commit 
> messages go through the same RPC channel from the same place. But we 
> shouldn't implicitly bake this assumption in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20536) Extend ColumnName to create StructFields with explicit nullable

2018-03-14 Thread Efim Poberezkin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398754#comment-16398754
 ] 

Efim Poberezkin commented on SPARK-20536:
-

Hello, [~jlaskowski], is this still a meaningful ticket? I could make a PR with 
ColumnName extension if it is but this functionality seems to be present.

> Extend ColumnName to create StructFields with explicit nullable
> ---
>
> Key: SPARK-20536
> URL: https://issues.apache.org/jira/browse/SPARK-20536
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jacek Laskowski
>Priority: Trivial
>
> {{ColumnName}} defines methods to create {{StructFields}}.
> It'd be very user-friendly if there were methods to create {{StructFields}} 
> with explicit {{nullable}} property (currently implicitly {{true}}).
> That could look as follows:
> {code}
> // E.g. def int: StructField = StructField(name, IntegerType)
> def int(nullable: Boolean): StructField = StructField(name, IntegerType, 
> nullable)
> // or (untested)
> def int(nullable: Boolean): StructField = int.copy(nullable = nullable)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org