Re: [Pharo-project] [squeak-dev] Re: Graph library in Smalltalk - Need for advices

Cédrick Béler Wed, 22 Dec 2010 06:28:33 -0800


> 
> Sorry for the late reply, I've been away on business. I'm going to have to
> make one of those unfortunate long posts, so hopefully I can hold the
> attention span of those already interested anyway.


Not so long and very interesting so please continue... :)
some comments below (remember I have not a lot of knowledge and experience on 
graphs)...

> 
> Cédrick - I found the main limitation with Lemon to be that it was a pain to
> extend as well as to integrate a nice persistence layer. 

ok. Pain to extend sounds bad... but as it seems to scale well, it could be a 
candidate for interoperability with a smalltalk lib...

> [...]

> It's been my experience after using probably 20 different graph libraries in
> real projects over the years that the following themes reoccur:

that's really why your opinion is really important !

> 
> 1. Library is good for visualization, but not for calculations - shortest
> path, centrality, MST, etc.
> 
> 2. Library is good for algorithms but has no visualization support

ok.
To me 1 and 2 should be dissociated anyway.

> 
> 3. Graph library relies too much on other libraries that are not that
> impressive either. Graphviz comes to mine. Great tool, but frankly the
> visualizations looks awful by today's standards and it's harder to do
> something fully dynamic. Piping a dot file to a unix process to produce a
> static image is not terribly useful in things like web applications.

I fully agree.

> 
> 4. Library does not scale predictably or to a certain point. 

This is an aspect I care not much as I said I only work on small scale project.
But this is definitely an important point. At least to know the limitations.

> 
> 5. Library does not support persistence properly.

A point I don't know.

> 
> 6. Library forces an API that takes over your data too much or doesn't play
> well with existing data. Any model that forces inheritance for Vertex and
> Edge classes comes to mind. See neo4j for some examples of a poor API on
> this point.

I think this is a really important point.
Is the mapping concept of lemon useful here  ?

> 
> 7. Library is good all around but lacks depth (missing algorithms, measures,
> etc) and does not play well with others.

entry point for extensions ? patterns ?

> 
> 8. Every library has its own idea of what a vertex and edge should be and
> there's no good way of passing data between them in a nice manner.

pass data between libs ?
Maybe once we get a lib in smalltalk, then other lib can be selected

> 
> 9. Visualization libraries have a multitude of issues - specific to desktop
> or web only, static output, no capability to annotate vertex or edges, no
> pruning... I have yet to find one that is both pretty and useful to this day
> without nearly recreating a new library each time.

ok.

> 
> The reality is every library will have to make compromises at a minimum in
> the following areas:
> 
> 1. Scalability
> 2. Data structures
> 3. Algorithms
> 4. Performance
> 5. Visualization

To me, (might be naive) we should have a nice and clean model for 2, 3 and 4. 
Performance is to me more about the complexity of an operation (ie. algorithm 
variations that rely on the chosen graph (internal) data structure).

Scalability (1) is really important but is another point, especially in 
Smalltalk. Would you use primitives, parallelism, multi-image, bridges to 
existing lib... , quantic vm (kidding :o) )?
What's more important is probably to have a good documentation stating what the 
limitations are.

Moreover, it seems 90% of interested people (here) in a graph lib are a priori 
interested in small scale apps.

Then another lib for 5 (might be mondrian based)...


> 
> I think you need to at least consider all the above when creating a library.

fully agree.
I guess we cannot claim to have the ultimate lib. But I would be very happy to 
have something smart "a la smalltalk" :)

> In fact, you probably should break it up as many libraries do into several
> components:
> 
> 1. Common, useful data structures (Vertex, Edge, Adjacency List, Adjaceny
> Matrix, etc.) that are optimized for your language - hopefully smalltalk

YES
but then what strategy to map objects ?

> 
> 2. Computations / Measurements

for instances ?

Where do you classify operations like moralization, variable elimination, 
junction tree transformation ? 
(yes it's directed to my need ;-) )

> 
> 3. Algorithms - layout, common graph problems, etc.
> 
> 4. Persistence - database or otherwise, likely several supporting libs
> specific to the persistence mechanism or implementing some kind of pattern
> to be agnostic.

would be great !

> The data structures should at least be designed in such a
> way that they are persistence friendly to put in something like Gemstone,
> Magma, Image, etc. See for example InfiniteGraph which has a Java interface
> built on Objectivity.

I guess gemstone (and other persistency players) could play a nice role here... 
:)

> 
> 5. Visualization - desktop, web, and static output.
> 
> The problem here is that creating a graph library is a huge undertaking. It
> might not sound like it, but they can grow to epic proportions. From my
> comments, you can see that it is really not the fault of any particular lib,
> but rather the subject matter. You could spend a lifetime packing in
> features. The real key is just to create a series of libs that work well
> together and not constantly reinvent the wheel with node classes in each
> library. 

yes I understand that :(  
To me, the problem is that graph is a too much generic term and involve too 
many possible processing that are not easily generalizable (cyclic/acyclic, 
directed/undirected, parallel/looping edges, a tree is a graph too...).
For instance, it's easier to me to implement a tree algorithm on a straight 
tree structure than on a particular graph being a tree.


> 
> A secondary problem is doing graph analysis and even drawing graphs can take
> a lot of horse power. Parallelism is a tough issue with graph libraries and
> there's a multitude of approaches from pure threads to map/reduce to random
> walking and stream processing.  This is further compounded by persistence
> where you need to start doing things like graph healing, partitioning, and
> sharding to scale to massive levels and maintain performance.

looks hard indeed.

> 
> I just want to mention to others that graphs are hugely useful in general to
> solve a variety of problems from recommendations to meta programming. It's
> not just pretty pictures and experiments with molecules. There's a lot of
> potential in Smalltalk to do something since generally I feel Smalltalkers
> aren't bound by or afraid of new approaches to old problems. Graph databases
> in many cases could replace relational DBs for example and let your app do
> all kinds of stuff you might never have thought of otherwise.
> 
> I could go on and on, but I'll stop myself here. Comments or thoughts?

My main thought is that I think you're clearly the most experienced guy here. 
That's why I'd like you to be the "chief architect" if we start a project 
around graphs.

Do you have some design ideas for:
Data structures (2)  with in mind Algorithms (3), Performance (4) and 
Scalability (1)
Again I think visualization (5) is another topic. It could come later...

Or in other word, could we start designing graph data structures as a project ? 
I'm pretty sure you have some strong opinions on their design so as not to 
sacrifice (too much) 3, 4 and 1 ;-). Or at least, you will know the limitations 
and consequences of design choices...

What do you think ? If you don't have enough time, you could just give 
directions and eventually review the code...

As Stephane said, this is not an urgent project and we don't have to build the 
ultimate lib...


Cheers,

Cédrick

> 
> 
> 
>>> 
>>> - data structure: directed graphs, undirected graphs, possible loop and
>>> parallel edged, ..., trees (?)
>>> - mapping: easily map objects, informations on nodes and/or edges (here,
>>> don't know if I'd like subclassing nodes/eges instead...)
>>> - iterators: efficient way to iterate over nodes and edges
>>> - algorithms: basic algorithms implementation (bfs, dfs, ..., shortest
>>> paths, ...), and plug-ability for specific ones...
>>> - visualization: having an interactive graph visualization web/SVG and
>>> eventually morphic (... graphviz, mondrian, .......)
>>> 
>>> then, I could use this for my research work...
>>> - I need "belief" nodes with associated conditional beliefs tables
>>> - I need to implement algorithms to propagate an information change
>>> (change of a node state) in any nodes... 
>>> ****mainly, I'd like to get junction trees from a graph [1] which are
>>> rely useful for several domains in machine learning field ***
>>> 
>>> Actually, I don't know if I really need a graph lib as a simple
>>> implementation directed to bayesian should be enough but it's the second
>>> time I need graphs (last time was for planification) and I think that
>>> would be great to have a nice and clean basic implementation.
>>> 
>>> Couldn't we start developing something similar to Lemon (regarding "API",
>>> enitites, etc...) that would work for small scale project project in
>>> smalltalk ?
>>

Re: [Pharo-project] [squeak-dev] Re: Graph library in Smalltalk - Need for advices

Reply via email to