Exactly! It is the topic I wanted to discuss on the latest community meeting!
On Fri, 2024-04-12 at 08:23 +0000, Weibin Zeng wrote: > Perhaps this is not directly relevant, but is the proposal you're > mentioning the topic that was intended for discussion at the last > community meeting? > > On 2024/04/11 18:58:56 Sem wrote: > > Hello! > > > > The current PySpark implementation has one serious problem: it > > won't > > work with the new Spark Connect because it relies on using `py4j` > > and > > an internal `_jvm` variable. > > > > My suggestion is to rewrite PySpark API from scratch in the > > following > > way: > > > > 1. We will have pure Python GraphInfo, EdgeInfo and VertexInfo > > 2. We will have pure PySpark utils (index generators) > > 3. We will use spark scala datasources for writing and reading in > > GAR > > format from PySpark > > > > It is a lot of work, but I'm committing to do it and support it in > > the > > future as a PMC of the project. Decoupling PySpark from Scala will > > also > > simplify Scala/Java development. Another good point is that the > > actual > > logic in PySpark will be mostly in Python code that simplifies > > reading > > of source code and debugging for everyone who wants to work with a > > library. > > > > Couple of additional dependencies will be introduced: > > 1. Pydantic for working with YAML-models of Info objects (MIT > > License, > > pure python) > > 2. PyYaml for the same reason (MIT License, pure python/cython) > > > > Overall it should be good for the project, bceause it will simplify > > testing of both part (spark and pyspark). > > > > I see GraphAr PySpark mostly not in production ETL-jobs but in > > interactive development and ad-hoc analytics on graph data. And > > typically such an analytics happens on Databricks Notebooks (does > > not > > provide an access to `_jvm` in shared clsuters) or in other tools > > (like > > VSCode spark-connect) relies on Spark Connect. So, for that case > > support of Spark Connect may be more important than for spark scala > > part that should be used for jobs not interactive development. > > > > Thoughts? > > > > ------------------------------------------------------------------- > > -- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
