Hi, Point 1: I have updated the impact section of the design doc with all the breaking changes for users. link<https://docs.google.com/document/d/1qsNksUJ_a6PL623iBZ-3QQDkFae81IKbqVQl1chsOfU/edit?usp=sharing>
Point 2: We have ran only Integration Test, let us know if you have some suggestions we'll try to do it. Regards, Satwik From: Alexey Romanenko <aromanenko....@gmail.com> Sent: Thursday, April 22, 2021 7:58 PM To: dev@beam.apache.org Subject: Re: [PROPOSAL] Upgrade Cassandra driver from 3.x to 4.x in CassandraIO Thanks, it looks promising! I just have a couple things to ask. 1) Could you briefly summarise and add here or/and to design doc all breaking changes for users that you expect (if any)? Can we avoid them, at least, maybe temporary? For example, we used to deprecate an old public API and keep it for the next three Beam releases before removing it completely. 2) Also, did you run any load tests to compare the performance between two driver versions for the same pipeline and datasets? If yes, could you share the results, please? -- Alexey On 20 Apr 2021, at 07:47, D, Anup (Nokia - IN/Bangalore) <anu...@nokia.com<mailto:anu...@nokia.com>> wrote: Hi All, Satwik and myself have been working together on this. 4.x has been a major revamp and we have highlighted below major differences that were seen during this activity. Please review and provide feedback. 1. Package names : 3.x : com.datastax.cassandra 4.x : com.datastax.oss Comment : 4.x is different from 3.x. We think both can co-exist. Please see JanusGraph who have included both the packages for reference [1] 1. Mapping : 3.x : Default Object Mapper took care of mapping all Entity types at runtime - org.apache.beam.sdk.io.cassandra.DefaultObjectMapper 4.x : Mapper auto-generates helper classes during compile time by processing annotations on Mapper,Dao and Entity. Then, use either a specific Dao or Generic Dao to access/map classes.[2][3] Comment : With objective to avoid/limit breaking changes, we could find providing a Generic/Base Dao via inheritance has limited breakage.[4] Impacts : a. Requires mapperFactoryFunction to be mandatorily supplied that can return SpecificDao reference. b. @GetEntity is the annotation that maps ResultSet to Entity which performs strict column checking among the two. This was not the case in 3.x. We had posted query to Cassandra community [5] 1. HadoopFormatIO Unit test in HadoopFormatIO that interacts with Cassandra failed when driver was upgraded to 4.x. Latest Apache Cassandra server still uses 3.x Cassandra connector. There is an open JIRA [6][7] 1. Load Balancing policy 3.x : Providing data center name is optional. 4.x : Load balancing policies have been revamped. Providing data center name is mandatory.[8] 1. Configuration 3.x : This was done by configuring classes. 4.x : Along with configuring classes, file-based configuration is supported. [9][10] Comment : We did test loading some part of configuration via file and some programmatically. There is no impact as such but this is a new complimenting feature . 1. Driver compatibility Cassandra 4.5+ drivers are fully compatible with Apache Cassandra 2.1+ versions.[11] The open source driver implementatation “com.datastax.oss” will be supported for interacting with Open source, commercial Cassandra There is no impact but highlighting [1] Update Cassandra driver to 4.x version · Issue #1510 · JanusGraph/janusgraph (github.com)<https://github.com/JanusGraph/janusgraph/issues/1510> [2] https://stackoverflow.com/questions/34701817/what-is-the-most-efficient-way-to-map-transform-cast-a-cassandra-boundstatement [3] https://docs.datastax.com/en/developer/java-driver/4.5/upgrade_guide/#object-mapper [4] https://stackoverflow.com/questions/61298743/genericdao-on-datastax-java-driver-4 [5] cassandra - Strict column checking in Datastax java driver 4 causing problems - Stack Overflow<https://stackoverflow.com/questions/66985742/strict-column-checking-in-datastax-java-driver-4-causing-problems> [6] https://issues.apache.org/jira/browse/CASSANDRA-15750 [7] https://javadoc.io/doc/org.apache.cassandra/cassandra-all/latest/org/apache/cassandra/hadoop/cql3/CqlInputFormat.html [8] https://docs.datastax.com/en/developer/java-driver/4.10/manual/core/load_balancing/ [9] https://github.com/datastax/java-driver/tree/4.0.0/upgrade_guide#configuration [10] https://docs.datastax.com/en/developer/java-driver/4.0/manual/core/configuration/ [11] https://docs.datastax.com/en/driver-matrix/doc/driver_matrix/javaDrivers.html Thanks Anup From: Alexey Romanenko <aromanenko....@gmail.com<mailto:aromanenko....@gmail.com>> Sent: Friday, April 16, 2021 11:02 PM To: dev@beam.apache.org<mailto:dev@beam.apache.org> Subject: Re: [PROPOSAL] Upgrade Cassandra driver from 3.x to 4.x in CassandraIO Thank you for design doc and starting a discussion on mailing list! I’m the next after Kenn to ask about the potential breaking changes with this upgrade. Could you elaborate a bit on this and can we support both versions in the same time? Alexey On 15 Apr 2021, at 12:32, S Bhandiwad, Satwik (Nokia - IN/Bangalore) <satwik.s_bhandi...@nokia.com<mailto:satwik.s_bhandi...@nokia.com>> wrote: Hi All, We would like to upgrade Cassandra driver version from 3.x to 4.x in CassandraIO Connector. Design Document - link<https://docs.google.com/document/d/1qsNksUJ_a6PL623iBZ-3QQDkFae81IKbqVQl1chsOfU/edit?usp=sharing> Pull Request - https://github.com/apache/beam/pull/14457/ Please go through the design doc & PR and let us know your thoughts. Regards, Satwik