Hey Adam,

Thank you very much for taking the time to respond and for the suggestions
:)

I am aiming to assess the viability of Apache Arrow for graph algorithms
and data structures, rather than addressing a particular issue. I am
interested to determine the extent to which I can utilize Apache Arrow for
this purpose and what potential challenges may arise.

and since I have already encountered some roadblocks early on in my
evaluation of Apache Arrow for graph algorithms and data structures, I am
very interested in hearing your point of view on this topic.

Thanks,
Bechir

On Fri, Jun 30, 2023, 13:38 Adam Lippai <a...@rigo.sk> wrote:

> Hi,
>
> I’d recommend integrating with GraphBlas/suitesparse which are sparse
> matrix multiplication based algorithms for common graph problems. There
> might be an overlap in the data structures used, eg creating and a new
> arrow type like the recent tensor type addition, or simply creating zero
> copy or memcopy based efficient converters would be something worth
> exploring.
>
> Best regards,
> Adam alippai
>
> On Fri, Jun 30, 2023 at 06:57 Bechir Ben Daadouch <bechirche...@gmail.com>
> wrote:
>
> > Thank you for taking the time to answer :)
> >
> > I don't have a fix Use-Case, but I am trying yo build a POC and evaluate
> > whether Apache Arrow could be adequate in the context of graphs. But I
> > found out very quickly that I won't be able to do all the necessary
> > algorithm steps using only Apache Arrow without resorting to other
> > libraries.
> >
> > On Fri, Jun 30, 2023, 07:36 Benson Muite <benson_mu...@emailplus.org>
> > wrote:
> >
> > > On 6/30/23 04:21, Bechir Ben Daadouch wrote:
> > > > Dear Apache Arrow Dev Community,
> > > >
> > > > My name is Bechir, I am currently working on a project that involves
> > > > implementing graph algorithms in Apache Arrow.
> > > >
> > > > The initial plan was to construct a node structure and a subsequent
> > graph
> > > > that would encompass all the nodes. However, I quickly realized that
> > due
> > > to
> > > > Apache Arrow's columnar format, this approach was not feasible.
> > > >
> > > > I tried a couple of things, including the implementation of the
> > > > shortest-path algorithm. However, I rapidly discovered that
> > manipulating
> > > > arrow objects, particularly when applying graph algorithms, proved
> more
> > > > complex than anticipated and it became very clear that I would need
> to
> > > > resort to some data structures outside of what arrow offers (i.e.:
> > Heapq
> > > > wouldn't be possible using arrow).
> > > >
> > > > I also gave a shot at doing it similar to a certain SQL method (see:
> > > > https://ibb.co/0rPGB42 ), but ran into some roadblocks there too
> and I
> > > > ended up having to resort to using Pandas for some transformations.
> > > >
> > > > My next course of action is to experiment with compressed sparse
> rows,
> > > > hoping to execute Matrix Multiplication using this method. But
> > honestly,
> > > > with what I know right now, I remain skeptical about the feasibility
> > > > of it. However,
> > > > before committing to this approach, I would greatly appreciate your
> > > opinion
> > > > based on your experience with Apache Arrow.
> > > >
> > > > Thank you very much for your time.
> > > >
> > > > Looking forward to potentially discussing this further.
> > > >
> > > > Many thanks,
> > > > Bechir
> > > >
> > > Arrow may not be the best choice for most graph algorithms as they
> > > typically require random memory accesses that will be difficult to
> > > coalesce into forms that allow for vectorization. If your data will fit
> > > in memory of a single node, you might consider:
> > > https://github.com/DrTimothyAldenDavis/GraphBLAS
> > > https://pypi.org/project/python-graphblas/
> > > https://github.com/JuliaSparse/SuiteSparseGraphBLAS.jl
> > >
> >
>

Reply via email to