[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2020-08-05 Thread ming li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171524#comment-17171524
 ] 

ming li commented on FLINK-9373:


cc [~aljoscha]

> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0, 1.6.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2020-07-31 Thread ming li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168554#comment-17168554
 ] 

ming li commented on FLINK-9373:


Hi,[~sihuazhou]. Recently I was reading the related code of 
flink-statebackend-rocksdb, and found that in the seekToLast method of 
org.apache.flink.contrib.streaming.state.RocksIteratorWrapper, 
iterator.seekToFirst is called. I am puzzled why iterator.seekToLast is not 
called.
{code:java}
//代码占位符
{code}
@Override public void seekToLast() \{ iterator.seekToFirst(); status(); }

> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0, 1.6.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-17 Thread Sihua Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479462#comment-16479462
 ] 

Sihua Zhou commented on FLINK-9373:
---

[~srichter] FYI [3558|https://github.com/facebook/rocksdb/issues/3558], got 
reply from RocksDB. I think we chosen the right way that should go ;), cause 
the status could be reset.

> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0, 1.6.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479035#comment-16479035
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/6020


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478954#comment-16478954
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user StefanRRichter commented on the issue:

https://github.com/apache/flink/pull/6020
  
Thanks @sihuazhou ! LGTM  Will merge this.


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478945#comment-16478945
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user sihuazhou commented on the issue:

https://github.com/apache/flink/pull/6020
  
cc @StefanRRichter 


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-17 Thread Stefan Richter (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478866#comment-16478866
 ] 

Stefan Richter commented on FLINK-9373:
---

Great, thanks a lot!

> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-17 Thread Sihua Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478865#comment-16478865
 ] 

Sihua Zhou commented on FLINK-9373:
---

I will updated the PR quickly.

> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.6.0, 1.5.1
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-17 Thread Stefan Richter (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478861#comment-16478861
 ] 

Stefan Richter commented on FLINK-9373:
---

[~sihuazhou] Will you be able to do this quickly or can I take over because 
this is currently THE release blocker?

> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.6.0, 1.5.1
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-17 Thread Stefan Richter (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478857#comment-16478857
 ] 

Stefan Richter commented on FLINK-9373:
---

Well, but we are also at risk to have only a partial fix. Ok, I suggest that we 
introduce a wrapper and check status there for all the methods mentioned in the 
documentation:

{{Seek()}}, {{Next()}}, {{SeekToFirst()}}, {{SeekToLast()}}, {{SeekForPrev()}}, 
and {{Prev()}}. To be on the safe side, and see if there is any performance 
change. If it turns out that that is too much, we can still drop it again. {{}}

> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.6.0, 1.5.1
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-17 Thread Sihua Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478848#comment-16478848
 ] 

Sihua Zhou commented on FLINK-9373:
---

I think that makes sense.

> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.6.0, 1.5.1
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-17 Thread Stefan Richter (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478842#comment-16478842
 ] 

Stefan Richter commented on FLINK-9373:
---

Maybe we can also take the PR basically "as is" for now so that we check the 
status for each iteration and add more checks in the next minor release if this 
turns out to be not enough. Does that make sense?

> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.6.0, 1.5.1
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478699#comment-16478699
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user StefanRRichter commented on the issue:

https://github.com/apache/flink/pull/6020
  
Let's wait a bit more for their response. It seems like this example is 
older than their corrected docs.


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478558#comment-16478558
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user sihuazhou commented on the issue:

https://github.com/apache/flink/pull/6020
  
Have't received any response from RocksDB yet, but I found this example 
with using `RocksIterator#stats()`: 
https://github.com/facebook/rocksdb/blob/3453870677ee2648f38d70fe8aa7fa16a93a96d2/java/samples/src/main/java/RocksDBSample.java


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477592#comment-16477592
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user sihuazhou commented on the issue:

https://github.com/apache/flink/pull/6020
  
FYI, I found this issue related to problem: 
https://github.com/facebook/rocksdb/issues/3558


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477573#comment-16477573
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user sihuazhou commented on the issue:

https://github.com/apache/flink/pull/6020
  
Agreed!


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477560#comment-16477560
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user StefanRRichter commented on the issue:

https://github.com/apache/flink/pull/6020
  
Maybe we should ask them on their issue tracker what the best practise is? 
I cannot remember seeing such checks in their code examples. Have a hard time 
to believe that this can be true, because it is not really documented on the 
Java API and also why wouldn't they always call `status` internally?


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477556#comment-16477556
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user StefanRRichter commented on the issue:

https://github.com/apache/flink/pull/6020
  
It depends, maybe this is already covered currently because we might always 
do an iteration attempt that checks right after the seek. But in general, this 
is not very nice and fragile if true.


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477554#comment-16477554
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user sihuazhou commented on the issue:

https://github.com/apache/flink/pull/6020
  
Oh My God...Is that means we need to wrap the `RocksIterator` to delegate 
all it API?


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477549#comment-16477549
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user StefanRRichter commented on the issue:

https://github.com/apache/flink/pull/6020
  
After double checking with the RocksDB docs, I am afraid that we need to 
introduce more checks because for example the point out that also after methods 
like `seek` the iterator an become corrupted. And if the status flag is 
potentially cleared, that means we need to check in all the places...crazy :-(


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477465#comment-16477465
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user sihuazhou commented on the issue:

https://github.com/apache/flink/pull/6020
  
Agree, should be correct first before fast! Could you please have a look at 
this? I think it's already for a look now~


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477459#comment-16477459
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user StefanRRichter commented on the issue:

https://github.com/apache/flink/pull/6020
  
It sounds cheap if they just `&` all the flags from the sub iterators. In 
the end, we can see if there is a performance drop but better be correct first 
before fast.


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477438#comment-16477438
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user sihuazhou commented on the issue:

https://github.com/apache/flink/pull/6020
  
@StefanRRichter I had a look at the implementation of the iterators in 
RocksDB, I found status just return the flag first `_status` as the result 
without any complex computation, But for some `composite Iterator` like the 
`MergeIteraor` and `TwoLevelIterator` it need to check all the 
`InternalIterator` they hold to decide the final status, and I also found the 
iterator could be reset to `OK` in some cases...Hmm...do you think this is 
super cheap or not?


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477414#comment-16477414
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user sihuazhou commented on the issue:

https://github.com/apache/flink/pull/6020
  
I'm going to check the native implementation and see whether the `status()` 
is a super cheap option...


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477413#comment-16477413
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user sihuazhou commented on the issue:

https://github.com/apache/flink/pull/6020
  
I think I am a bit torn here now...


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477412#comment-16477412
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user sihuazhou commented on the issue:

https://github.com/apache/flink/pull/6020
  
That is a good question, and I'm not sure...but I think that seems to be...


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477409#comment-16477409
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user StefanRRichter commented on the issue:

https://github.com/apache/flink/pull/6020
  
Oh you are right, this is confusing :-) So does this also mean the status 
flag is cleared when we simple continue iterating and only check in the end?


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477406#comment-16477406
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user sihuazhou commented on the issue:

https://github.com/apache/flink/pull/6020
  
I think that is the incorrect one, If I'm not confused by the wiki's 
content...


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477399#comment-16477399
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user StefanRRichter commented on the issue:

https://github.com/apache/flink/pull/6020
  
Also from the RocksDB docs: `In another word, if Iterator::Valid() is true, 
status() is guaranteed to be OK()`


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477397#comment-16477397
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user StefanRRichter commented on the issue:

https://github.com/apache/flink/pull/6020
  
Yes, but eventually it will also return `false`, which is essentially the 
same as waiting until the loop terminates. Anyways, I think after the loop is 
the nicer way.


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477393#comment-16477393
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user sihuazhou commented on the issue:

https://github.com/apache/flink/pull/6020
  
@StefanRRichter NO, I think that couldn't fix this issue, the problem here 
is that even `iterator.isValid()` return `true`, there may also some internal 
error in RocksDB. What do you think?


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477389#comment-16477389
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user StefanRRichter commented on the issue:

https://github.com/apache/flink/pull/6020
  
You could also in `isRocksIteratorValid` run the check only if the return 
value is `false` if you like the helper method to avoid people forgetting about 
this check.


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477387#comment-16477387
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user sihuazhou commented on the issue:

https://github.com/apache/flink/pull/6020
  
@StefanRRichter No, I didn't have any performance tests yet. I think you 
are right! Your proposal is the way I'm going to choose. Addressing this...


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477383#comment-16477383
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user StefanRRichter commented on the issue:

https://github.com/apache/flink/pull/6020
  
A quick general question: could you observe any performance impact from 
calling the `status()` method in the loops. It looks like a native method and I 
am not sure that it is inexpensive. Maybe the better idea is to only check 
`isValid()` in the loops and check `status()` only once after the loop to 
ensure that everything was well and complete. Maybe that is also the reason why 
this is split into two methods in the first place. What do you think?


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9373) Fix potential data losing for RocksDBBackend

2018-05-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477309#comment-16477309
 ] 

ASF GitHub Bot commented on FLINK-9373:
---

Github user sihuazhou commented on the issue:

https://github.com/apache/flink/pull/6020
  
The reasons that the travis given red light is unrelated...


> Fix potential data losing for RocksDBBackend
> 
>
> Key: FLINK-9373
> URL: https://issues.apache.org/jira/browse/FLINK-9373
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, when using RocksIterator we only use the _iterator.isValid()_ to 
> check whether we have reached the end of the iterator. But that is not 
> enough, if we refer to RocksDB's wiki 
> https://github.com/facebook/rocksdb/wiki/Iterator#error-handling we should 
> find that even if _iterator.isValid()=true_, there may also exist some 
> internal error. A safer way to use the _RocksIterator_ is to always call the 
> _iterator.status()_ to check the internal error of _RocksDB_. There is a case 
> from user email seems to lost data because of this 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Missing-MapState-when-Timer-fires-after-restored-state-td20134.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)