Hey Adam, Thank you very much for taking the time to respond and for the suggestions :)
I am aiming to assess the viability of Apache Arrow for graph algorithms and data structures, rather than addressing a particular issue. I am interested to determine the extent to which I can utilize Apache Arrow for this purpose and what potential challenges may arise. and since I have already encountered some roadblocks early on in my evaluation of Apache Arrow for graph algorithms and data structures, I am very interested in hearing your point of view on this topic. Thanks, Bechir On Fri, Jun 30, 2023, 13:38 Adam Lippai <a...@rigo.sk> wrote: > Hi, > > I’d recommend integrating with GraphBlas/suitesparse which are sparse > matrix multiplication based algorithms for common graph problems. There > might be an overlap in the data structures used, eg creating and a new > arrow type like the recent tensor type addition, or simply creating zero > copy or memcopy based efficient converters would be something worth > exploring. > > Best regards, > Adam alippai > > On Fri, Jun 30, 2023 at 06:57 Bechir Ben Daadouch <bechirche...@gmail.com> > wrote: > > > Thank you for taking the time to answer :) > > > > I don't have a fix Use-Case, but I am trying yo build a POC and evaluate > > whether Apache Arrow could be adequate in the context of graphs. But I > > found out very quickly that I won't be able to do all the necessary > > algorithm steps using only Apache Arrow without resorting to other > > libraries. > > > > On Fri, Jun 30, 2023, 07:36 Benson Muite <benson_mu...@emailplus.org> > > wrote: > > > > > On 6/30/23 04:21, Bechir Ben Daadouch wrote: > > > > Dear Apache Arrow Dev Community, > > > > > > > > My name is Bechir, I am currently working on a project that involves > > > > implementing graph algorithms in Apache Arrow. > > > > > > > > The initial plan was to construct a node structure and a subsequent > > graph > > > > that would encompass all the nodes. However, I quickly realized that > > due > > > to > > > > Apache Arrow's columnar format, this approach was not feasible. > > > > > > > > I tried a couple of things, including the implementation of the > > > > shortest-path algorithm. However, I rapidly discovered that > > manipulating > > > > arrow objects, particularly when applying graph algorithms, proved > more > > > > complex than anticipated and it became very clear that I would need > to > > > > resort to some data structures outside of what arrow offers (i.e.: > > Heapq > > > > wouldn't be possible using arrow). > > > > > > > > I also gave a shot at doing it similar to a certain SQL method (see: > > > > https://ibb.co/0rPGB42 ), but ran into some roadblocks there too > and I > > > > ended up having to resort to using Pandas for some transformations. > > > > > > > > My next course of action is to experiment with compressed sparse > rows, > > > > hoping to execute Matrix Multiplication using this method. But > > honestly, > > > > with what I know right now, I remain skeptical about the feasibility > > > > of it. However, > > > > before committing to this approach, I would greatly appreciate your > > > opinion > > > > based on your experience with Apache Arrow. > > > > > > > > Thank you very much for your time. > > > > > > > > Looking forward to potentially discussing this further. > > > > > > > > Many thanks, > > > > Bechir > > > > > > > Arrow may not be the best choice for most graph algorithms as they > > > typically require random memory accesses that will be difficult to > > > coalesce into forms that allow for vectorization. If your data will fit > > > in memory of a single node, you might consider: > > > https://github.com/DrTimothyAldenDavis/GraphBLAS > > > https://pypi.org/project/python-graphblas/ > > > https://github.com/JuliaSparse/SuiteSparseGraphBLAS.jl > > > > > >