Perhaps this is not directly relevant, but is the proposal you're mentioning 
the topic that was intended for discussion at the last community meeting?

On 2024/04/11 18:58:56 Sem wrote:
> Hello!
> 
> The current PySpark implementation has one serious problem: it won't
> work with the new Spark Connect because it relies on using `py4j` and
> an internal `_jvm` variable.
> 
> My suggestion is to rewrite PySpark API from scratch in the following
> way:
> 
> 1. We will have pure Python GraphInfo, EdgeInfo and VertexInfo
> 2. We will have pure PySpark utils (index generators)
> 3. We will use spark scala datasources for writing and reading in GAR
> format from PySpark
> 
> It is a lot of work, but I'm committing to do it and support it in the
> future as a PMC of the project. Decoupling PySpark from Scala will also
> simplify Scala/Java development. Another good point is that the actual
> logic in PySpark will be mostly in Python code that simplifies reading
> of source code and debugging for everyone who wants to work with a
> library.
> 
> Couple of additional dependencies will be introduced:
> 1. Pydantic for working with YAML-models of Info objects (MIT License,
> pure python)
> 2. PyYaml  for the same reason (MIT License, pure python/cython)
> 
> Overall it should be good for the project, bceause it will simplify
> testing of both part (spark and pyspark).
> 
> I see GraphAr PySpark mostly not in production ETL-jobs but in
> interactive development and ad-hoc analytics on graph data. And
> typically such an analytics happens on Databricks Notebooks (does not
> provide an access to `_jvm` in shared clsuters) or in other tools (like
> VSCode spark-connect) relies on Spark Connect. So, for that case
> support of Spark Connect may be more important than for spark scala
> part that should be used for jobs not interactive development.
> 
> Thoughts?
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to