Hi,

I’d recommend integrating with GraphBlas/suitesparse which are sparse
matrix multiplication based algorithms for common graph problems. There
might be an overlap in the data structures used, eg creating and a new
arrow type like the recent tensor type addition, or simply creating zero
copy or memcopy based efficient converters would be something worth
exploring.

Best regards,
Adam alippai

On Fri, Jun 30, 2023 at 06:57 Bechir Ben Daadouch <bechirche...@gmail.com>
wrote:

> Thank you for taking the time to answer :)
>
> I don't have a fix Use-Case, but I am trying yo build a POC and evaluate
> whether Apache Arrow could be adequate in the context of graphs. But I
> found out very quickly that I won't be able to do all the necessary
> algorithm steps using only Apache Arrow without resorting to other
> libraries.
>
> On Fri, Jun 30, 2023, 07:36 Benson Muite <benson_mu...@emailplus.org>
> wrote:
>
> > On 6/30/23 04:21, Bechir Ben Daadouch wrote:
> > > Dear Apache Arrow Dev Community,
> > >
> > > My name is Bechir, I am currently working on a project that involves
> > > implementing graph algorithms in Apache Arrow.
> > >
> > > The initial plan was to construct a node structure and a subsequent
> graph
> > > that would encompass all the nodes. However, I quickly realized that
> due
> > to
> > > Apache Arrow's columnar format, this approach was not feasible.
> > >
> > > I tried a couple of things, including the implementation of the
> > > shortest-path algorithm. However, I rapidly discovered that
> manipulating
> > > arrow objects, particularly when applying graph algorithms, proved more
> > > complex than anticipated and it became very clear that I would need to
> > > resort to some data structures outside of what arrow offers (i.e.:
> Heapq
> > > wouldn't be possible using arrow).
> > >
> > > I also gave a shot at doing it similar to a certain SQL method (see:
> > > https://ibb.co/0rPGB42 ), but ran into some roadblocks there too and I
> > > ended up having to resort to using Pandas for some transformations.
> > >
> > > My next course of action is to experiment with compressed sparse rows,
> > > hoping to execute Matrix Multiplication using this method. But
> honestly,
> > > with what I know right now, I remain skeptical about the feasibility
> > > of it. However,
> > > before committing to this approach, I would greatly appreciate your
> > opinion
> > > based on your experience with Apache Arrow.
> > >
> > > Thank you very much for your time.
> > >
> > > Looking forward to potentially discussing this further.
> > >
> > > Many thanks,
> > > Bechir
> > >
> > Arrow may not be the best choice for most graph algorithms as they
> > typically require random memory accesses that will be difficult to
> > coalesce into forms that allow for vectorization. If your data will fit
> > in memory of a single node, you might consider:
> > https://github.com/DrTimothyAldenDavis/GraphBLAS
> > https://pypi.org/project/python-graphblas/
> > https://github.com/JuliaSparse/SuiteSparseGraphBLAS.jl
> >
>

Reply via email to