[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-06-15 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554655#comment-17554655
 ] 

Etienne Chauchot commented on FLINK-26793:
--

> Just to double-check: Can we optimize anything on Flink side to lower the 
> performance drop on restart

CassandraSink does not store any state: when a snapshot is requested, it just 
waits for the in-flight requests to finish



 

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: documentation, pull-request-available
> Attachments: Capture d’écran de 2022-04-14 16-34-59.png, Capture 
> d’écran de 2022-04-14 16-35-07.png, Capture d’écran de 2022-04-14 
> 16-35-30.png, jobmanager_log.txt, taskmanager_127.0.1.1_33251-af56fa_log
>
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-06-13 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553465#comment-17553465
 ] 

Etienne Chauchot commented on FLINK-26793:
--

[~arvid], I went through the code as promised: flink code cannot influence the 
cassandra cluster behavior as described above (there is no timeout options 
available in the driver) just the cassandra cluster conf option mentioned 
above. So, I propose to submit a PR to add this documentation to the sinks and 
outputformats.

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: documentation
> Attachments: Capture d’écran de 2022-04-14 16-34-59.png, Capture 
> d’écran de 2022-04-14 16-35-07.png, Capture d’écran de 2022-04-14 
> 16-35-30.png, jobmanager_log.txt, taskmanager_127.0.1.1_33251-af56fa_log
>
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-05-09 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533731#comment-17533731
 ] 

Etienne Chauchot commented on FLINK-26793:
--

[~arvid] just to let you know, I have not forgotten the last task on this 
ticket to check if we can improve  the code, I'm just waiting for the 
[cassandra upgrade PR|https://github.com/apache/flink/pull/19586] to be merged 
before testing the perf again and see if something can be improved 

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Assignee: Etienne Chauchot
>Priority: Major
> Attachments: Capture d’écran de 2022-04-14 16-34-59.png, Capture 
> d’écran de 2022-04-14 16-35-07.png, Capture d’écran de 2022-04-14 
> 16-35-30.png, jobmanager_log.txt, taskmanager_127.0.1.1_33251-af56fa_log
>
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-04-26 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528228#comment-17528228
 ] 

Etienne Chauchot commented on FLINK-26793:
--

[~arvid] thanks for your comment. I don't think either that there is anything 
to do at the Flink level but I'll make sure by a deep second round to see if we 
can improve something.

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Assignee: Etienne Chauchot
>Priority: Major
> Attachments: Capture d’écran de 2022-04-14 16-34-59.png, Capture 
> d’écran de 2022-04-14 16-35-07.png, Capture d’écran de 2022-04-14 
> 16-35-30.png, jobmanager_log.txt, taskmanager_127.0.1.1_33251-af56fa_log
>
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-04-24 Thread Arvid Heise (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527329#comment-17527329
 ] 

Arvid Heise commented on FLINK-26793:
-

Just to double-check: Can we optimize anything on Flink side to lower the 
performance drop on restart? For me, it looks like there is nothing and we 
should just include the information about the timeout on the Cassandra 
documentation page (maybe even in the Javadocs).

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Assignee: Etienne Chauchot
>Priority: Major
> Attachments: Capture d’écran de 2022-04-14 16-34-59.png, Capture 
> d’écran de 2022-04-14 16-35-07.png, Capture d’écran de 2022-04-14 
> 16-35-30.png, jobmanager_log.txt, taskmanager_127.0.1.1_33251-af56fa_log
>
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-04-14 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522355#comment-17522355
 ] 

Etienne Chauchot commented on FLINK-26793:
--

[~bumblebee] I reproduced the behavior you observed: I ran [this streaming 
infinite 
pipeline|https://github.com/echauchot/flink-samples/blob/master/src/main/java/org/example/CassandraPojoSinkStreamingExample.java]
 for 3 days on a local flink 1.14.4 cluster + cassandra 3.0 docker (I did not 
have rights to instanciate a cassandra cluster on Azure). The pipeline has 
checkpointing configured every 10 min with exactly once semantics and no 
watermark defined. It was run at parallelism 16 which corresponds to the number 
of cores on my laptop. I created a source that gives pojos every 100 ms.  The 
source is mono-threaded so at parallelism 1.  See all the screenshots

I ran the pipeline for more than 72 hours and indeed after little less than 
72h, I got an exception from Cassandra cluster see task manager log:

{code:java}
2022-04-13 16:38:15,227 ERROR 
org.apache.flink.streaming.connectors.cassandra.CassandraPojoSink [] - Error 
while sending value.
com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout 
during write query at consistency LOCAL_ONE (1 replica were required but only 0 
acknowledged the write)
{code}
This exception means that Cassandra coordinator node (internal Cassandra) 
waited too long for an internal replication (raplication to another node in the 
same casssandra "datacenter") and did not ack the write. 

This led to a failure of the write task and to a restoration of the job from 
the last checkpoint see job manager log:

{code:java}
2022-04-13 16:38:20,847 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - Job Cassandra 
Pojo Sink Streaming example (dc7522bc1855f6f98038ac2b4eed4095) switched from 
state RESTARTING to RUNNING.
2022-04-13 16:38:20,850 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Restoring job 
dc7522bc1855f6f98038ac2b4eed4095 from Checkpoint 136 @ 1649858983772 for 
dc7522bc1855f6f98038ac2b4eed4095 located at 
file:/tmp/flink-checkpoints/dc7522bc1855f6f98038ac2b4eed4095/chk-136.
{code}

Obviously, this restoration led to the re-creation of the _CassandraPojoSink_ 
and to the re-creation of the related _MappingManager_

So in short, this is exactly what I supposed in my previous comments. Restoring 
from checkpoints  slow down you writes (job restart time +  cassandra driver 
state re-creation - connection,  prepared statements etc... -)

The problem is that the timeout comes from Cassandra itself not from Flink and 
it is normal that Flink restores the job in such circumstances. 
What you can do is to increase the Cassandra write timeout to your workload in 
your Cassandra cluster so that such errors do not happen. For that you need to 
raise _write_request_timeout_in_ms_  conf parameter in your _cassandra.yml_.

I do not recommend that you lower the replication factor in your Cassandra 
cluster (I did that only from local tests on Flink) because it is mandatory 
that you do not loose data in case of your Cassandra cluster failure. Waiting 
for a single replica for write acknowledge is the minimum level for this 
guarantee.

Best
Etienne

  !Capture d’écran de 2022-04-14 16-34-59.png! 

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Assignee: Etienne Chauchot
>Priority: Major
> Attachments: Capture d’écran de 2022-04-14 16-34-59.png
>
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-04-06 Thread Jay Ghiya (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518255#comment-17518255
 ] 

Jay Ghiya commented on FLINK-26793:
---

absolutely. [~echauchot] makes sense. we have also started the run again today. 
To answer your first question we have not observed "query is not prepared" . 
for the second question -> we do not have logs for our previous run now.  We 
are rerunning. [~chesnay]-> Hi. so we did not observe back pressure in terms of 
operator metrics. But we wanted to check whether the implementation (the task) 
is creating only instance of mapping manager and that [~echauchot] confirmed. 
so we are good on that.

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Assignee: Etienne Chauchot
>Priority: Major
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-04-06 Thread Chesnay Schepler (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518112#comment-17518112
 ] 

Chesnay Schepler commented on FLINK-26793:
--

[~bumblebee] Have you actually observed performance issues though? Is the sink 
actually to slow for you (i.e., is the sink causing back-pressure?)

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Assignee: Etienne Chauchot
>Priority: Major
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-04-06 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518077#comment-17518077
 ] 

Etienne Chauchot commented on FLINK-26793:
--

[~bumblebee], in the meantime, I made a [pure-flink test 
pipeline|https://github.com/echauchot/flink/blob/FLINK-26793-cassandra-perf/flink-connectors/flink-connector-cassandra/src/test/java/org/apache/flink/streaming/connectors/cassandra/example/CassandraPojoSinkStreamingExample.java]
 that runs in streaming mode (so it is infinite) with a fake source that 
provides Pojos every 100 ms and write them to Cassandra. I've run this pipeline 
locally for 2 hours with no perf log messages. I'll try to run them for 72 
hours or more  on my Azure environment (hoping that I can get a Cassandra 
instance there). That way, I'll try to reproduce the performance issue logs you 
had. If I can't reproduce after some days of running this pipeline, I think the 
good way to proceed is to take a look at your Flink restore stats hence my 
question above or take a look at the Scylla connector you use.

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Assignee: Etienne Chauchot
>Priority: Major
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-04-06 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517914#comment-17517914
 ] 

Etienne Chauchot commented on FLINK-26793:
--

[~bumblebee] thanks for the first answer. Can you answer also the other 2 
questions of my comment above ?

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Assignee: Etienne Chauchot
>Priority: Major
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-04-05 Thread Jay Ghiya (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517570#comment-17517570
 ] 

Jay Ghiya commented on FLINK-26793:
---

Hi [~echauchot] - Regardin "query is not prepared ..." -> No this is not 
observed. So it is possible that restarts would have happened. we have seen 
that quite frequent in our aks dev environment. I might not be able to check 
previous logs as of now in terms of checkpoints as we run lyft flink operator. 
But i can start one more reliability run to confirm the same. Will keep you 
posted.i should have update by friday end of day. Hope this helps :) 

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Assignee: Etienne Chauchot
>Priority: Major
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-04-04 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516904#comment-17516904
 ] 

Etienne Chauchot commented on FLINK-26793:
--

[~bumblebee] in particular do you see in Flink logs the message you referred 
from stackoverflow: "{_}query is not prepared on /XX.YY.91.205:9042, preparing 
before retrying executing. Seeing this message a few times is fine, but seeing 
it a lot may be source of performance problems"{_} in addition to the scylla 
message you referred in the ticket.

Also, please go to Flink monitoring UI in the _Checkpoints/Overview_ tab and 
search for _Latest Restore._ That way we can check if there was a checkpoint 
restore and validate/invalidate my supposition about restoring entailing a 
re-creation of the MappingManager (in case it is not part of the Sink Operator 
snapshot state).

Also please confirm that in the code you showed above _datastream_ object 
contains POJOs and not tuples so that we make sure the pipeline indeed uses 
_CassandraPojoSink_ (sink instanciation is automatic depending on the content 
of your DataStream)

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Assignee: Etienne Chauchot
>Priority: Major
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-04-01 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515978#comment-17515978
 ] 

Etienne Chauchot commented on FLINK-26793:
--

[~bumblebee] The only thing I can think of is that _CassandraPojoSink_ was 
recreated for example after a failure/recovery. In that case it would recreate 
the _MappingManager_ and thus loose the prepared statements stored in it. Do 
you see anything in the Flink logs ?

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Assignee: Etienne Chauchot
>Priority: Major
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-03-27 Thread Jay Ghiya (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512859#comment-17512859
 ] 

Jay Ghiya commented on FLINK-26793:
---

[~echauchot] thanks for initial analysis -> looks like the mapper obj is 
created only once for one task slot. then i am wondering why are we getting 
this error. Let me also get familiar with the given codebase. it happens after 
48-72 hours of run. 

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Assignee: Etienne Chauchot
>Priority: Major
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-03-27 Thread Jay Ghiya (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512858#comment-17512858
 ] 

Jay Ghiya commented on FLINK-26793:
---

Hi [~echauchot] -> We use following piece of code:

{code:java}
// Some comments here
CassandraSink.addSink(DataStream).setClusterBuilder(ClusterObj).setDefaultKeyspace(xx).setMaxConcurrentRequests(numberOfTaskSlots).build()
{code}


> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Assignee: Etienne Chauchot
>Priority: Major
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-03-25 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512513#comment-17512513
 ] 

Etienne Chauchot commented on FLINK-26793:
--

[~bumblebee] copying what I wrote on the ML:

On a regular pipeline such as 
_org.apache.flink.streaming.connectors.cassandra.example.CassandraPojoSinkExample_
 with a parallelism of 4 you will have 4 _CassandraPojoSink_ with for each, one 
{_}MappingManager{_}, one Cassandra _Cluster_ and one Cassandra _Session._ This 
is perfectly right as the sinks in production could reside on different 
taskManagers on different machines.

You can test it yourself by running in debug mode _CassandraPojoSinkExample_ 
locally and count the number of these objects in your local JVM. Don't forget 
to start a local Cassandra cluster (with docker for example) listening on 
127.0.0.1:9042 and create the keyspace and table described in 
{_}CassandraPojoSinkExample{_}.

 

Could you give more details on your particular issue ?

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Assignee: Etienne Chauchot
>Priority: Major
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-03-24 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17511677#comment-17511677
 ] 

Etienne Chauchot commented on FLINK-26793:
--

[~bumblebee] how do you create the cassandra sink ? Do you use standard 
_CassandraSink.addSink(DataStream)_ or do you manually create the sink and call 
{_}CassandraPojoSink#open(){_}. How  does it work with Scylla driver ?

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Assignee: Etienne Chauchot
>Priority: Major
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-03-23 Thread Martijn Visser (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17511086#comment-17511086
 ] 

Martijn Visser commented on FLINK-26793:


I've assigned [~echauchot] to this ticket. Let's create some follow-up tickets 
for other Cassandra related activities. 

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Priority: Major
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-03-23 Thread Martijn Visser (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17511078#comment-17511078
 ] 

Martijn Visser commented on FLINK-26793:


Thanks all! 
Also good to read up on are the FLIPs that explain the APIs:

FLIP-27 for the Source API: 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
FLIP-143 for the Sink API: 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API 
+ FLIP-171 for the ASync Sink 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Priority: Major
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-03-22 Thread Jay Ghiya (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510720#comment-17510720
 ] 

Jay Ghiya commented on FLINK-26793:
---

awesome thankyou [~echauchot] ! grateful!

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Priority: Major
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-03-22 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510717#comment-17510717
 ] 

Etienne Chauchot commented on FLINK-26793:
--

[~bumblebee] splitting looks good to me. I guess migration to new API Jira 
already exists. As I know the current code I could take care of the urgent perf 
issue. If you need some details about the new API there is a talk from [~arvid] 
that explains it: 
[https://www.youtube.com/watch?v=LCMfbGv38u8&list=PLDX4T_cnKjD0J2LFr7yBk2aSS_o2l-7ue]

 

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Priority: Major
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-03-22 Thread Jay Ghiya (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510528#comment-17510528
 ] 

Jay Ghiya commented on FLINK-26793:
---

[~echauchot]  [~martijnvisser]- could we take this task and split into two 
activities. First being the solving the performance issue so that product 
releases are not halted. and second being the migration to new api?

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Priority: Major
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-03-22 Thread Jay Ghiya (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510521#comment-17510521
 ] 

Jay Ghiya commented on FLINK-26793:
---

[~martijnvisser] thankyou for explaining the gravity of the current situation. 
Ready to contribute as flink has been a fundamental block for our product. 
always grateful for that. [~echauchot] count me in. But i need information 
about latest source/sink api and the old source/sink api that flink connector 
is based on. i will also try to pull people from my current org.

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Priority: Major
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-03-22 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510394#comment-17510394
 ] 

Etienne Chauchot commented on FLINK-26793:
--

I just saw this ticket. I think solving this perf problem now even if we plan 
to migrate this connector to the new source/sink api makes sense. 
[~martijnvisser] I can take car of it if you agree

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Priority: Major
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26793) Flink Cassandra connector performance issue

2022-03-22 Thread Martijn Visser (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510320#comment-17510320
 ] 

Martijn Visser commented on FLINK-26793:


[~bumblebee] The Cassandra connector isn't actively maintained at this moment 
(we're still open/looking for new contributors). Hopefully we'll find some or 
else we'll have to drop the Cassandra connector completely (especially since 
it's not using the target Source/Sink API)

> Flink Cassandra connector performance issue 
> 
>
> Key: FLINK-26793
> URL: https://issues.apache.org/jira/browse/FLINK-26793
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Cassandra
>Affects Versions: 1.14.4
>Reporter: Jay Ghiya
>Priority: Major
>
> A warning is observed during long runs of flink job stating “Insertions into 
> scylla might be suffering. Expect performance problems unless this is 
> resolved.”
> Upon initial analysis - “flink cassandra connector is not keeping instance of 
> mapping manager that is used to convert a pojo to cassandra row. Ideally the 
> mapping manager should have the same life time as cluster and session objects 
> which are also created once when the driver is initialized”
> Reference: 
> https://stackoverflow.com/questions/59203418/cassandra-java-driver-warning



--
This message was sent by Atlassian Jira
(v8.20.1#820001)