Dear Apache Arrow Dev Community, My name is Bechir, I am currently working on a project that involves implementing graph algorithms in Apache Arrow.
The initial plan was to construct a node structure and a subsequent graph that would encompass all the nodes. However, I quickly realized that due to Apache Arrow's columnar format, this approach was not feasible. I tried a couple of things, including the implementation of the shortest-path algorithm. However, I rapidly discovered that manipulating arrow objects, particularly when applying graph algorithms, proved more complex than anticipated and it became very clear that I would need to resort to some data structures outside of what arrow offers (i.e.: Heapq wouldn't be possible using arrow). I also gave a shot at doing it similar to a certain SQL method (see: https://ibb.co/0rPGB42 ), but ran into some roadblocks there too and I ended up having to resort to using Pandas for some transformations. My next course of action is to experiment with compressed sparse rows, hoping to execute Matrix Multiplication using this method. But honestly, with what I know right now, I remain skeptical about the feasibility of it. However, before committing to this approach, I would greatly appreciate your opinion based on your experience with Apache Arrow. Thank you very much for your time. Looking forward to potentially discussing this further. Many thanks, Bechir