Perhaps this is not directly relevant, but is the proposal you're mentioning the topic that was intended for discussion at the last community meeting?
On 2024/04/11 18:58:56 Sem wrote: > Hello! > > The current PySpark implementation has one serious problem: it won't > work with the new Spark Connect because it relies on using `py4j` and > an internal `_jvm` variable. > > My suggestion is to rewrite PySpark API from scratch in the following > way: > > 1. We will have pure Python GraphInfo, EdgeInfo and VertexInfo > 2. We will have pure PySpark utils (index generators) > 3. We will use spark scala datasources for writing and reading in GAR > format from PySpark > > It is a lot of work, but I'm committing to do it and support it in the > future as a PMC of the project. Decoupling PySpark from Scala will also > simplify Scala/Java development. Another good point is that the actual > logic in PySpark will be mostly in Python code that simplifies reading > of source code and debugging for everyone who wants to work with a > library. > > Couple of additional dependencies will be introduced: > 1. Pydantic for working with YAML-models of Info objects (MIT License, > pure python) > 2. PyYaml for the same reason (MIT License, pure python/cython) > > Overall it should be good for the project, bceause it will simplify > testing of both part (spark and pyspark). > > I see GraphAr PySpark mostly not in production ETL-jobs but in > interactive development and ad-hoc analytics on graph data. And > typically such an analytics happens on Databricks Notebooks (does not > provide an access to `_jvm` in shared clsuters) or in other tools (like > VSCode spark-connect) relies on Spark Connect. So, for that case > support of Spark Connect may be more important than for spark scala > part that should be used for jobs not interactive development. > > Thoughts? > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
