Re: [sage-devel] Best way to store a database?

2014-09-29 Thread mmarco
Oh, and i forgot to mention: some of the invariants are only well defined 
up to multiplication by a unit. That means that quering for the knots with 
a given invariant means not only to check for the ones with identical value 
in that field, but to make a (potentially costly) comparison one by one. 
Either that or finding some kind of unique normal form for them.

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.


Re: [sage-devel] Best way to store a database?

2014-09-29 Thread mmarco
The way that the info is stored in the downloadable knot altas is explained 
here:

http://katlas.math.toronto.edu/wiki/The_Take_Home_Database

I don't know about the snappy databse. But i agree it would be cool to have 
it. Right now it goes beyond the scope of the knotr/link module, but 
maybein the future we could add  3- and 4-manifolds to sage. That would be 
a even a harder work than the basic knots we are working on, though.

El domingo, 28 de septiembre de 2014 23:35:19 UTC+2, vdelecroix escribió:
>
> 2014-09-28 21:01 UTC+02:00, mmarco >: 
> > Typically invariants are polynomials with integer coefficients 
> (sometimes 
> > with negative or even rational exponents). I guess they could be 
> converted 
> > to a dictionary (maybe represented by a list), but doesn't sound like 
> the 
> > kind of data structure that fits well on a databse index. 
> > 
> > Maybe storing the hash of the invariants would be a good idea? 
>
> No. Store the polynomial as a string, it would be much more simple... 
> and much more readable. At some point you might want to access the 
> database without having to guess what is the data inside. 
>
> The upstream database is only a text file? Do you know how they do in 
> Snappy? There is a huge collection of 3 manifolds inside. It would be 
> cool to have something somehow compatible. 
>
> Vincent 
>

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.


Re: [sage-devel] Best way to store a database?

2014-09-28 Thread Vincent Delecroix
2014-09-28 21:01 UTC+02:00, mmarco :
> Typically invariants are polynomials with integer coefficients (sometimes
> with negative or even rational exponents). I guess they could be converted
> to a dictionary (maybe represented by a list), but doesn't sound like the
> kind of data structure that fits well on a databse index.
>
> Maybe storing the hash of the invariants would be a good idea?

No. Store the polynomial as a string, it would be much more simple...
and much more readable. At some point you might want to access the
database without having to guess what is the data inside.

The upstream database is only a text file? Do you know how they do in
Snappy? There is a huge collection of 3 manifolds inside. It would be
cool to have something somehow compatible.

Vincent

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.


Re: [sage-devel] Best way to store a database?

2014-09-28 Thread mmarco
Typically invariants are polynomials with integer coefficients (sometimes 
with negative or even rational exponents). I guess they could be converted 
to a dictionary (maybe represented by a list), but doesn't sound like the 
kind of data structure that fits well on a databse index.

Maybe storing the hash of the invariants would be a good idea?

El sábado, 27 de septiembre de 2014 22:22:44 UTC+2, wstein escribió:
>
> On Sat, Sep 27, 2014 at 12:22 PM, mmarco > 
> wrote: 
> >> 
> >> How long ? 
> >> 
> > 
> > Around  five seconds in a very fast SSD disk. 
> >> 
> >> 
> >> 
> >> What kind of Python/Sage object do you want to store at the end ? 
> >> Dictionaries of strings (easy to store in a SQL table) or less standard 
> >> Sage objects ? What takes most processing time, parsing the file or 
> >> creating Sage adequate objects from well stored strings ? Is the data 
> in 
> >> the upstream RDF database organised as a graph and will you use this 
> >> structure in your queries (in which case relational SQL may not be 
> >> appropriate, note that python has libs for dealing with RDF instead of 
> >> parsing it as raw text) ? What kind of queries will the database deal 
> >> with ? Are the kind of queries already stored on the upstream file or 
> >> should you preprocess new columns for better performance ? Will the 
> >> queries involve testing properties on complex Sage objects or looking 
> >> for existence of string and comparing integers ? 
> > 
> > 
> > The file contains basically lines of the form: 
> > 
> > knot name / invariant name /invariant value 
> > 
> > Of course the invariant values are stored as strings, but we need to 
> convert 
> > them to objects like polynomials. 
> > 
> > The main idea i had was to be able to "identify" a knot. That is, given 
> an 
> > arbitrary knot by the user, compute the corresponding invariants, and 
> then 
> > look at the database for possible candidates (that is, knots with the 
> same 
> > invariants). That means that we would need fast comparisons against the 
> > objects stored in the database. 
>
> Or if the data types are standard (e.g., integers, floats,  etc.), 
> then you can build 
> an index.  This is a 1-liner in SQLite, and makes queries super-fast. 
>
> William 
>
> > 
> > Of course it could also be used on the opposite direction: to construct 
> > knots in sage just by their identifier in the database. 
> > 
> > 
> >> 
> >> 
> >> Also, the answer may depend on whether the upstream database is 
> evolving 
> >> fast and whether you will ensure long-term maintenance of the package. 
> >> Depending on this, an option could be to have a command within Sage 
> that 
> >> fetches, parses/preprocess (if there is a benefit), and store the 
> >> database on user's demand. This can be compatible with offering a 
> >> preprocessed package as well if preprocessing takes more time than 
> >> fetching (like distributing sources vs distributing binaries). 
> > 
> > 
> > I think the upstream datbse is essentially stabilized. Maybe at some 
> point 
> > they could add more knots to the database, but doesn't seem likely. 
> >> 
> >> 
> >> Ciao, 
> >> Thierry 
> >> 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "sage-devel" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to sage-devel+...@googlegroups.com . 
> > To post to this group, send email to sage-...@googlegroups.com 
> . 
> > Visit this group at http://groups.google.com/group/sage-devel. 
> > For more options, visit https://groups.google.com/d/optout. 
>
>
>
> -- 
> William Stein 
> Professor of Mathematics 
> University of Washington 
> http://wstein.org 
> wst...@uw.edu  
>

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.


Re: [sage-devel] Best way to store a database?

2014-09-27 Thread William A Stein
On Sat, Sep 27, 2014 at 12:22 PM, mmarco  wrote:
>>
>> How long ?
>>
>
> Around  five seconds in a very fast SSD disk.
>>
>>
>>
>> What kind of Python/Sage object do you want to store at the end ?
>> Dictionaries of strings (easy to store in a SQL table) or less standard
>> Sage objects ? What takes most processing time, parsing the file or
>> creating Sage adequate objects from well stored strings ? Is the data in
>> the upstream RDF database organised as a graph and will you use this
>> structure in your queries (in which case relational SQL may not be
>> appropriate, note that python has libs for dealing with RDF instead of
>> parsing it as raw text) ? What kind of queries will the database deal
>> with ? Are the kind of queries already stored on the upstream file or
>> should you preprocess new columns for better performance ? Will the
>> queries involve testing properties on complex Sage objects or looking
>> for existence of string and comparing integers ?
>
>
> The file contains basically lines of the form:
>
> knot name / invariant name /invariant value
>
> Of course the invariant values are stored as strings, but we need to convert
> them to objects like polynomials.
>
> The main idea i had was to be able to "identify" a knot. That is, given an
> arbitrary knot by the user, compute the corresponding invariants, and then
> look at the database for possible candidates (that is, knots with the same
> invariants). That means that we would need fast comparisons against the
> objects stored in the database.

Or if the data types are standard (e.g., integers, floats,  etc.),
then you can build
an index.  This is a 1-liner in SQLite, and makes queries super-fast.

William

>
> Of course it could also be used on the opposite direction: to construct
> knots in sage just by their identifier in the database.
>
>
>>
>>
>> Also, the answer may depend on whether the upstream database is evolving
>> fast and whether you will ensure long-term maintenance of the package.
>> Depending on this, an option could be to have a command within Sage that
>> fetches, parses/preprocess (if there is a benefit), and store the
>> database on user's demand. This can be compatible with offering a
>> preprocessed package as well if preprocessing takes more time than
>> fetching (like distributing sources vs distributing binaries).
>
>
> I think the upstream datbse is essentially stabilized. Maybe at some point
> they could add more knots to the database, but doesn't seem likely.
>>
>>
>> Ciao,
>> Thierry
>>
> --
> You received this message because you are subscribed to the Google Groups
> "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sage-devel+unsubscr...@googlegroups.com.
> To post to this group, send email to sage-devel@googlegroups.com.
> Visit this group at http://groups.google.com/group/sage-devel.
> For more options, visit https://groups.google.com/d/optout.



-- 
William Stein
Professor of Mathematics
University of Washington
http://wstein.org
wst...@uw.edu

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.


Re: [sage-devel] Best way to store a database?

2014-09-27 Thread mmarco

>
>
> How long ? 
>
>
Around  five seconds in a very fast SSD disk.

>
>
> What kind of Python/Sage object do you want to store at the end ? 
> Dictionaries of strings (easy to store in a SQL table) or less standard 
> Sage objects ? What takes most processing time, parsing the file or 
> creating Sage adequate objects from well stored strings ? Is the data in 
> the upstream RDF database organised as a graph and will you use this 
> structure in your queries (in which case relational SQL may not be 
> appropriate, note that python has libs for dealing with RDF instead of 
> parsing it as raw text) ? What kind of queries will the database deal 
> with ? Are the kind of queries already stored on the upstream file or 
> should you preprocess new columns for better performance ? Will the 
> queries involve testing properties on complex Sage objects or looking 
> for existence of string and comparing integers ? 
>

The file contains basically lines of the form:

knot name / invariant name /invariant value

Of course the invariant values are stored as strings, but we need to 
convert them to objects like polynomials.

The main idea i had was to be able to "identify" a knot. That is, given an 
arbitrary knot by the user, compute the corresponding invariants, and then 
look at the database for possible candidates (that is, knots with the same 
invariants). That means that we would need fast comparisons against the 
objects stored in the database.

Of course it could also be used on the opposite direction: to construct 
knots in sage just by their identifier in the database.

 

>
> Also, the answer may depend on whether the upstream database is evolving 
> fast and whether you will ensure long-term maintenance of the package. 
> Depending on this, an option could be to have a command within Sage that 
> fetches, parses/preprocess (if there is a benefit), and store the 
> database on user's demand. This can be compatible with offering a 
> preprocessed package as well if preprocessing takes more time than 
> fetching (like distributing sources vs distributing binaries). 
>

I think the upstream datbse is essentially stabilized. Maybe at some point 
they could add more knots to the database, but doesn't seem likely. 

>
> Ciao, 
> Thierry 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.


Re: [sage-devel] Best way to store a database?

2014-09-27 Thread William A Stein
On Saturday, September 27, 2014, Travis Scrimshaw 
wrote:

>
> It is, but in a totally outdated version. I am not sure whether there is
>> a ticket for upgrading it.
>>
>>
We should delete the optional package entirely.  Just encourage people to
pip install it instead. The same goes for every optional package that is in
pypi.



> Quite true. What we've done for FindStat is use easy_install in the Sage
> shell to upgrade it after installing the optional spkg (and deleting the
> old egg information).
>
> Best,
> Travis
>
>  --
> You received this message because you are subscribed to the Google Groups
> "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sage-devel+unsubscr...@googlegroups.com
> 
> .
> To post to this group, send email to sage-devel@googlegroups.com
> .
> Visit this group at http://groups.google.com/group/sage-devel.
> For more options, visit https://groups.google.com/d/optout.
>


-- 
William Stein
Professor of Mathematics
University of Washington
http://wstein.org
wst...@uw.edu

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.


Re: [sage-devel] Best way to store a database?

2014-09-27 Thread Thierry
Hi,

On Fri, Sep 26, 2014 at 08:48:13AM -0700, mmarco wrote:
> I am working on an optional package for the knot atlas. The idea is to 
> download the database of knots and links and be able to query it.
> 
> I have downloaded a +300MB .rdf file from the knot atlas web page. Juyst 
> parsing it takes a bit.

How long ?

> Which would be the right format to store that information in a way
> that is fast to access? Or is it better to just use the file provided
> by upstream and parse it every time?

What kind of Python/Sage object do you want to store at the end ?
Dictionaries of strings (easy to store in a SQL table) or less standard
Sage objects ? What takes most processing time, parsing the file or
creating Sage adequate objects from well stored strings ? Is the data in
the upstream RDF database organised as a graph and will you use this
structure in your queries (in which case relational SQL may not be
appropriate, note that python has libs for dealing with RDF instead of
parsing it as raw text) ? What kind of queries will the database deal
with ? Are the kind of queries already stored on the upstream file or
should you preprocess new columns for better performance ? Will the
queries involve testing properties on complex Sage objects or looking
for existence of string and comparing integers ?

Also, the answer may depend on whether the upstream database is evolving
fast and whether you will ensure long-term maintenance of the package.
Depending on this, an option could be to have a command within Sage that
fetches, parses/preprocess (if there is a benefit), and store the
database on user's demand. This can be compatible with offering a
preprocessed package as well if preprocessing takes more time than
fetching (like distributing sources vs distributing binaries).

Ciao,
Thierry

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.


Re: [sage-devel] Best way to store a database?

2014-09-26 Thread William A Stein
On Friday, September 26, 2014, Julien Puydt 
wrote:

> Hi,
>
> Le 26/09/2014 22:18, mmarco a écrit :
>
>> So i take it would require sqlite as a dependency? or is it already
>> shipped
>> with sage by default?
>>
>
> It is shipped (quite recent),


>
To clarify - we have included sqlite with sage since at least 2007.

>
> and with python bindings (sqlalchemy -- that one a little old).


Just to clarify:  Python itself has built-in bindings to sqlite, which of
course are built as part of sage.   Sqlalchemy is NOT "python bindings to
sqlite".  Instead it is an ORM - A Library that maps python objects to
relational database data.I highly recommend NOT using sqlalchemy for
such a specific high performance math research application.  It is aimed
much more at things like web applications...

I do very highly recommend just going to the sqlite website and learning
about it directly.  It's often fastest to create a CSV file (say) and
directly read it in rather than using the python interface.

Sqlite is a fantastic high quality piece of software when it makes sense to
use it.


>
> Snark on #sagemath
>
> --
> You received this message because you are subscribed to the Google Groups
> "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sage-devel+unsubscr...@googlegroups.com.
> To post to this group, send email to sage-devel@googlegroups.com.
> Visit this group at http://groups.google.com/group/sage-devel.
> For more options, visit https://groups.google.com/d/optout.
>


-- 
William Stein
Professor of Mathematics
University of Washington
http://wstein.org
wst...@uw.edu

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.


Re: [sage-devel] Best way to store a database?

2014-09-26 Thread John Cremona
On 26 September 2014 16:55, William A Stein  wrote:
> On Fri, Sep 26, 2014 at 8:48 AM, mmarco  wrote:
>> I am working on an optional package for the knot atlas. The idea is to
>> download the database of knots and links and be able to query it.
>>
>> I have downloaded a +300MB .rdf file from the knot atlas web page. Juyst
>> parsing it takes a bit. Which would be the right format to store that
>> information in a way that is fast to access? Or is it better to just use the
>> file provided by upstream and parse it every time?
>
> It depends on what you're doing.   If you can use SQLITE [1] then it
> is pretty much the optimal solution to the above problem.(I say
> "can use", since SQLITE is awesome when it is applicable to a problem,
> but it might not be since it doesn't use a client/server model.)
> Python has excellent SQLITE support.   I usually just use [2]
> directly, though there are object relational mappers (I don't
> recommend that if you care about performance).
>
> [1] http://www.sqlite.org/
>
> [2] https://docs.python.org/2/library/sqlite3.html
>

If you want to see a model for this, see sage/databases/cremona.py,
but please note that I didn't write it and don't myself know SQLITE.
The code there includes functions for creating the database from plain
text files (which I maintain separately) as well as the querying
interface.  The former is very good to include, since it means that I
can update the database (which goes into database_cremona_ellcurve)
without knowing what it is actually doing behind the scenes -- the
instructions are in the spkg documentation files.

There are several other things in sage/databases/  which you should
look at, including sql_db.py which claims

"This module implements classes (SQLDatabase and SQLQuery (pythonic
implementation for the user with little or no knowledge of sqlite))
that wrap the basic functionality of sqlite."

It was (re)written by Andrew Ohana so should be good!  (It was he who
wrote cremona.py too).

John

> --
> William Stein
> Professor of Mathematics
> University of Washington
> http://wstein.org
> wst...@uw.edu
>
> --
> You received this message because you are subscribed to the Google Groups 
> "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sage-devel+unsubscr...@googlegroups.com.
> To post to this group, send email to sage-devel@googlegroups.com.
> Visit this group at http://groups.google.com/group/sage-devel.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.


Re: [sage-devel] Best way to store a database?

2014-09-26 Thread William A Stein
On Fri, Sep 26, 2014 at 8:48 AM, mmarco  wrote:
> I am working on an optional package for the knot atlas. The idea is to
> download the database of knots and links and be able to query it.
>
> I have downloaded a +300MB .rdf file from the knot atlas web page. Juyst
> parsing it takes a bit. Which would be the right format to store that
> information in a way that is fast to access? Or is it better to just use the
> file provided by upstream and parse it every time?

It depends on what you're doing.   If you can use SQLITE [1] then it
is pretty much the optimal solution to the above problem.(I say
"can use", since SQLITE is awesome when it is applicable to a problem,
but it might not be since it doesn't use a client/server model.)
Python has excellent SQLITE support.   I usually just use [2]
directly, though there are object relational mappers (I don't
recommend that if you care about performance).

[1] http://www.sqlite.org/

[2] https://docs.python.org/2/library/sqlite3.html

-- 
William Stein
Professor of Mathematics
University of Washington
http://wstein.org
wst...@uw.edu

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.


[sage-devel] Best way to store a database?

2014-09-26 Thread mmarco
I am working on an optional package for the knot atlas. The idea is to 
download the database of knots and links and be able to query it.

I have downloaded a +300MB .rdf file from the knot atlas web page. Juyst 
parsing it takes a bit. Which would be the right format to store that 
information in a way that is fast to access? Or is it better to just use 
the file provided by upstream and parse it every time?

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.