Re: [DISCUSS] phoenix-spark connector

Rejeb Ben Rejeb Sat, 31 Aug 2024 02:31:38 -0700

Ok, I'll create a ticket to remove V1 datasource code and do as discussed.

Le jeu. 29 août 2024 à 13:08, Istvan Toth <[email protected]> a
écrit :


> On second thought that makes sense, we didn't touch the V1 driver when
> porting to Spark3.
> Your suggestion sounds good.
>
>
> On Thu, Aug 29, 2024 at 11:53 AM Rejeb Ben Rejeb <[email protected]>
> wrote:
>
> > Yes, it's correct.
> >
> > Le jeu. 29 août 2024 à 11:12, Istvan Toth <[email protected]> a
> > écrit :
> >
> > > So the V1 code in Spark 3 still behaves as Spark2 does, which is
> > different
> > > from the V2 in Spark3.
> > > Do I understand correctly ?
> > >
> > > On Thu, Aug 29, 2024 at 10:28 AM Rejeb Ben Rejeb <
> > [email protected]>
> > > wrote:
> > >
> > > > Sorry, I explained it badly, I meant spark3 will support both Append
> > and
> > > > Overwrite mode and both will behave the same way.
> > > > I agree that the new one is correct and that we shouldn't add support
> > for
> > > > Overwrite mode.
> > > > I think it is a better option than keeping old V1 code.
> > > > I don't think that it is possible to gate the new behavior somehow by
> > > > overriding spark internal code.
> > > >
> > > > Le jeu. 29 août 2024 à 09:20, Istvan Toth <[email protected]
> >
> > a
> > > > écrit :
> > > >
> > > > > On Wed, Aug 28, 2024 at 2:49 PM Rejeb Ben Rejeb <
> > > [email protected]
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Le mer. 28 août 2024 à 10:17, Istvan Toth
> > <[email protected]
> > > >
> > > > a
> > > > > > écrit :
> > > > > >
> > > > > > > On Mon, Aug 26, 2024 at 1:59 PM Rejeb Ben Rejeb <
> > > > > [email protected]
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > REMOVE DATASOURCE V1:
> > > > > > > > After removing V1 code, it is possible to configure V1 name
> as
> > a
> > > V2
> > > > > > long
> > > > > > > > name.
> > > > > > > > I have to test it to be sure but this can be done by moving
> the
> > > > > > > > class PhoenixDataSource under package
> > "org.apache.phoenix.spark".
> > > > > > > > In this way, it will have no impact on old applications which
> > use
> > > > the
> > > > > > > spark
> > > > > > > > API.
> > > > > > > >
> > > > > > > Would creating a compatibility child of the driver under the
> old
> > > > > package
> > > > > > > name work ?
> > > > > > > I don't like the idea of moving the up-to-date code to a new
> > > package.
> > > > > >
> > > > > > I did some tests and with just moving PhoenixDataSource, I think
> > > > behavior
> > > > > > has changed since last time I worked on a connector.
> > > > > > Now it needs to rename the class to DefaultSource to make it
> work.
> > > > > > The best solution will be making DefaultSource inherit from
> > > > > >
> > > > > PhoenixDataSource works to avoid moving and renaming classes.
> > > > > >
> > > > > Sounds good.
> > > > >
> > > > > > For the spark3 connector, there is a small change to make it
> accept
> > > > > > Overwrite mode and it will behave the same as Append mode.
> > > > > > It's ok for me, since it is meant to maintain backward
> > compatibility.
> > > > > >
> > > > > We've changed that once, I wouldn't change the behaviour again
> > > > (especially
> > > > > as IMO the new one is correct).
> > > > > I think it would be best to gate that new behaviour behind an
> option.
> > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > > When I wrote my first message I forget that there is also
> > helper
> > > > > > methods
> > > > > > > > like "phoenixTableAsDataFrame" or "saveToPhoenix", for those
> we
> > > > have
> > > > > > two
> > > > > > > > options:
> > > > > > > >
> > > > > > > >    1. Assume that these methods are no longer maintained,
> > > document
> > > > to
> > > > > > use
> > > > > > > >    spark API instead and remove them.
> > > > > > > >    2. Keep methods and change method implementation to point
> to
> > > the
> > > > > V2
> > > > > > > >    datasource (all options of V1 are available with V2).
> > > > > > > >
> > > > > > > > Personally, I prefer option 1 as for old scala or java
> > > applications
> > > > > > they
> > > > > > > > need code and dependencies update to use a newest version of
> > > > > connector
> > > > > > > > anyway. Python or R applications will not be impacted as they
> > use
> > > > > Spark
> > > > > > > > API.
> > > > > > > >
> > > > > > > While I agree with you from a technical POV, the reality is
> that
> > > > there
> > > > > > are
> > > > > > > a lot of legacy spark jobs that I'd prefer not to break.
> > > > > > > Option 2 sounds better to me.
> > > > > > >
> > > > > > >
> > > > > > > > BUILD ARTIFACTS WITH DIFFERENT SCALA VERSIONS:
> > > > > > > > Yes, since the connector for spark 2 was compiled with scala
> > 2.11
> > > > it
> > > > > > > can't
> > > > > > > > be run with spark 2 compiled with scala 2.12. Same applies
> for
> > > the
> > > > > > spark
> > > > > > > 3
> > > > > > > > connector with scala 2.12 vs 2.13.
> > > > > > > > I meant to have this for later releases, IMHO, actually this
> > is a
> > > > > > > > limitation and it will be good to have the connector built
> with
> > > > both
> > > > > > > scala
> > > > > > > > versions so usage will not be restricted to only one version
> of
> > > the
> > > > > > spark
> > > > > > > > build.
> > > > > > > > I've done some quick research, it seems that there is a way
> to
> > > > manage
> > > > > > > this
> > > > > > > > with the scala-maven-plugin throw multiple executions instead
> > of
> > > > > using
> > > > > > > > maven profiles.
> > > > > > > >
> > > > > > > > That sounds fine, please open a ticket, and a PR with your
> > > > preferred
> > > > > > > solution.
> > > > > > >
> > > > > > OK I'll do it.
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > Rejeb
> > > > > > > >
> > > > > > > >
> > > > > > > > Le lun. 26 août 2024 à 08:49, Istvan Toth <[email protected]>
> a
> > > > > écrit :
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > Forgive my ignorance of Spark:
> > > > > > > > >
> > > > > > > > > REMOVE DATASOURCE V1:
> > > > > > > > >
> > > > > > > > > IIRC the V1 and V2 datasources have different names.
> > > > > > > > > Wouldn't this break applications using the old V1 name ?
> > > > > > > > > Is there a chance that this would break old applications ?
> > > > > > > > >
> > > > > > > > > BUILD ARTIFACTS WITH DIFFERENT SCALA VERSIONS:
> > > > > > > > >
> > > > > > > > > Is this required because scala 2.x runtimes are not
> backwards
> > > > > > > compatible
> > > > > > > > ?
> > > > > > > > > I don't see a problem with that.
> > > > > > > > >
> > > > > > > > > Its utility is limited until we start providing actual
> > releases
> > > > and
> > > > > > > > publish
> > > > > > > > > binary artifacts, but
> > > > > > > > > in theory I agree.
> > > > > > > > >
> > > > > > > > > The implementation would be a bit tricky, the solution that
> > > comes
> > > > > to
> > > > > > my
> > > > > > > > > mind is generating the artifacts
> > > > > > > > > in multiple maven runs with different profiles, like we do
> > for
> > > > the
> > > > > > > > > different HBase profiles now.
> > > > > > > > >
> > > > > > > > > Istvan
> > > > > > > > >
> > > > > > > > > On Fri, Aug 23, 2024 at 7:36 PM Rejeb Ben Rejeb <
> > > > > > > [email protected]
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > I would like to start a discussion about two changes to
> the
> > > > > > > > > phoenix5-spark
> > > > > > > > > > and phoenix5-spark3.
> > > > > > > > > >
> > > > > > > > > > REMOVE DATASOURCE V1
> > > > > > > > > > It is not longer necessarie to keep Datasource V1
> classes,
> > > > since
> > > > > > all
> > > > > > > > > > features are implemented in new connector version
> classes.
> > T
> > > > > > > > > > When fixing the issue PHOENIX-6783, I checked for impacts
> > and
> > > > > done
> > > > > > > some
> > > > > > > > > > modifications to make removing the classes safe and
> without
> > > > > > impacts.
> > > > > > > > > >
> > > > > > > > > > BUILD ARTIFACTS WITH DIFFERENT SCALA VERSIONS
> > > > > > > > > > phoenix5-spark2 connector uses spark-2.4.8 wich is
> > available
> > > > with
> > > > > > > scala
> > > > > > > > > > 2.11 and scala 2.12.
> > > > > > > > > > Same for phoenix5-spark3 uses spark-3.2.4 wich is
> available
> > > > with
> > > > > > > scala
> > > > > > > > > 2.12
> > > > > > > > > > and scala 2.13.
> > > > > > > > > >
> > > > > > > > > > It would be nice to have connector supporting both scala
> > > > version
> > > > > > like
> > > > > > > > > other
> > > > > > > > > > connectors for exemple mongoDB or cassandra.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Rejeb
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Cordialement,
> > > > > > > > Rejeb Ben Rejeb
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > *István Tóth* | Sr. Staff Software Engineer
> > > > > > > *Email*: [email protected]
> > > > > > > cloudera.com <https://www.cloudera.com>
> > > > > > > [image: Cloudera] <https://www.cloudera.com/>
> > > > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
> > > [image:
> > > > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera>
> > [image:
> > > > > > Cloudera
> > > > > > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > > > > > ------------------------------
> > > > > > > ------------------------------
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Cordialement,
> > > > > > Rejeb Ben Rejeb
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > *István Tóth* | Sr. Staff Software Engineer
> > > > > *Email*: [email protected]
> > > > > cloudera.com <https://www.cloudera.com>
> > > > > [image: Cloudera] <https://www.cloudera.com/>
> > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
> [image:
> > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> > > > Cloudera
> > > > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > > > ------------------------------
> > > > > ------------------------------
> > > > >
> > > >
> > > >
> > > > --
> > > > Cordialement,
> > > > Rejeb Ben Rejeb
> > > >
> > >
> > >
> > > --
> > > *István Tóth* | Sr. Staff Software Engineer
> > > *Email*: [email protected]
> > > cloudera.com <https://www.cloudera.com>
> > > [image: Cloudera] <https://www.cloudera.com/>
> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> > Cloudera
> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > ------------------------------
> > > ------------------------------
> > >
> >
> >
> > --
> > Cordialement,
> > Rejeb Ben Rejeb
> >
>
>
> --
> *István Tóth* | Sr. Staff Software Engineer
> *Email*: [email protected]
> cloudera.com <https://www.cloudera.com>
> [image: Cloudera] <https://www.cloudera.com/>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
> ------------------------------
>


-- 
Cordialement,
Rejeb Ben Rejeb

Re: [DISCUSS] phoenix-spark connector

Reply via email to