Re: [DISCUSS] phoenix-spark connector

Rejeb Ben Rejeb Mon, 26 Aug 2024 04:59:12 -0700

Hi,

REMOVE DATASOURCE V1:
After removing V1 code, it is possible to configure V1 name as a V2 long
name.
I have to test it to be sure but this can be done by moving the
class PhoenixDataSource under package "org.apache.phoenix.spark".
In this way, it will have no impact on old applications which use the spark
API.
When I wrote my first message I forget that there is also helper methods
like "phoenixTableAsDataFrame" or "saveToPhoenix", for those we have two
options:

   1. Assume that these methods are no longer maintained, document to use
   spark API instead and remove them.
   2. Keep methods and change method implementation to point to the V2
   datasource (all options of V1 are available with V2).

Personally, I prefer option 1 as for old scala or java applications they
need code and dependencies update to use a newest version of connector
anyway. Python or R applications will not be impacted as they use Spark API.

BUILD ARTIFACTS WITH DIFFERENT SCALA VERSIONS:
Yes, since the connector for spark 2 was compiled with scala 2.11 it can't
be run with spark 2 compiled with scala 2.12. Same applies for the spark 3
connector with scala 2.12 vs 2.13.
I meant to have this for later releases, IMHO, actually this is a
limitation and it will be good to have the connector built with both scala
versions so usage will not be restricted to only one version of the spark
build.
I've done some quick research, it seems that there is a way to manage this
with the scala-maven-plugin throw multiple executions instead of using
maven profiles.

Rejeb

Le lun. 26 août 2024 à 08:49, Istvan Toth <st...@apache.org> a écrit :

> Hi,
>
> Forgive my ignorance of Spark:
>
> REMOVE DATASOURCE V1:
>
> IIRC the V1 and V2 datasources have different names.
> Wouldn't this break applications using the old V1 name ?
> Is there a chance that this would break old applications ?
>
> BUILD ARTIFACTS WITH DIFFERENT SCALA VERSIONS:
>
> Is this required because scala 2.x runtimes are not backwards compatible ?
> I don't see a problem with that.
>
> Its utility is limited until we start providing actual releases and publish
> binary artifacts, but
> in theory I agree.
>
> The implementation would be a bit tricky, the solution that comes to my
> mind is generating the artifacts
> in multiple maven runs with different profiles, like we do for the
> different HBase profiles now.
>
> Istvan
>
> On Fri, Aug 23, 2024 at 7:36 PM Rejeb Ben Rejeb <benrejebre...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I would like to start a discussion about two changes to the
> phoenix5-spark
> > and phoenix5-spark3.
> >
> > REMOVE DATASOURCE V1
> > It is not longer necessarie to keep Datasource V1 classes, since all
> > features are implemented in new connector version classes. T
> > When fixing the issue PHOENIX-6783, I checked for impacts and done some
> > modifications to make removing the classes safe and without impacts.
> >
> > BUILD ARTIFACTS WITH DIFFERENT SCALA VERSIONS
> > phoenix5-spark2 connector uses spark-2.4.8 wich is available with scala
> > 2.11 and scala 2.12.
> > Same for phoenix5-spark3 uses spark-3.2.4 wich is available with scala
> 2.12
> > and scala 2.13.
> >
> > It would be nice to have connector supporting both scala version like
> other
> > connectors for exemple mongoDB or cassandra.
> >
> > Thanks,
> > Rejeb
> >
>

-- 
Cordialement,
Rejeb Ben Rejeb

Re: [DISCUSS] phoenix-spark connector

Reply via email to