Re: [sqlite] Using SQL or C for data analysis?

2012-03-27 Thread Petite Abeille

On Mar 28, 2012, at 12:50 AM, Simon wrote:

>  - Look at sqlite's source code and try to implement analytical functions
> in a way that leads to an optimization better than log(n^2) and contribute
> my findings on this topic back to the community.

Enhancing SQLite with analytics would be a major, and very valuable, 
achievement. I wish you success! :)

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using SQL or C for data analysis?

2012-03-27 Thread Darren Duncan

Simon Slavin wrote:

On 27 Mar 2012, at 11:50pm, Simon  wrote:


Thank you all very much for all your answers, they have been most useful.


You're welcome.  Something else to consider is whether you should be doing this 
in C.  C++ can do everything, but it's not ideally suited to heavy mathematical 
operations.  You might want to consider using R (a free Matlab-like system 
available for pretty-much every OS) which already has a ton of mathematical and 
graphical functions:



I can tell you from experience, writing code to extract, rearrange, analyse and 
plot data is /far/ faster in R than it is in C, even if you don't know R very 
well.  R has a package RSQLite which gives it access to data stored in SQLite 
databases.  So you could write your data-gathering code in C, use that to feed 
the data into an SQLite database, then use R to do your analysis.  You might 
like to take a look at this:



If you're using C because you already have something else you want to interface 
with, sorry for wasting your time.


Something not mentioned, but an alternative solution is to use Postgres 8.4+ 
(9.1 latest), which lets you do window functions directly in SQL.  I know thats 
a not-SQLite solution, but it may be the least work to accomplish what you want, 
as its still terse/declarational SQL, and its also open source. -- Darren Duncan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using SQL or C for data analysis?

2012-03-27 Thread Simon Slavin

On 27 Mar 2012, at 11:50pm, Simon  wrote:

> Thank you all very much for all your answers, they have been most useful.

You're welcome.  Something else to consider is whether you should be doing this 
in C.  C++ can do everything, but it's not ideally suited to heavy mathematical 
operations.  You might want to consider using R (a free Matlab-like system 
available for pretty-much every OS) which already has a ton of mathematical and 
graphical functions:



I can tell you from experience, writing code to extract, rearrange, analyse and 
plot data is /far/ faster in R than it is in C, even if you don't know R very 
well.  R has a package RSQLite which gives it access to data stored in SQLite 
databases.  So you could write your data-gathering code in C, use that to feed 
the data into an SQLite database, then use R to do your analysis.  You might 
like to take a look at this:



If you're using C because you already have something else you want to interface 
with, sorry for wasting your time.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using SQL or C for data analysis?

2012-03-27 Thread Simon
>
>   I would love to do an sql query that would look like the following
> ones.  I wonder if they are possible and valid applications for SQL and
> what would be the proper implementation for these.  I know I can calculate
> all this using C, but it would be most useful (to my later projects) to do
> it in SQL directly (my last example gives you an idea where I'm going).
>
> select max(opening_price, closing_price, high_price) - min(opening_price,
> closing_price, low_price) as day_range...
> select closing_price, moving_average(20,closing_price),
> exp_mov_avg(20,closing_price)...
> select closing_price, moving_average( funky_oscillator( closing_price )
> )...
>


I have described the problem that I was facing.
You guys provided all the pieces of the puzzle.
I now have to solve the problem.


Here's the path that I have decided to take for now, more or less in
priority:

  - Implement what I need now in plain C++, after the data is extracted
from sqlite.
  - Understand what analytical functions are, look at their implementations
and attempt my own in plain C++
  - Experiment at creating my own sqlite functions, agg.functions and
virtual tables as these will be instrumental to my goal or future goals.
  - Look into the libraries you guys suggested for analyzing the data (I
keep this mostly last because my analytic skills are still at a minimum, so
this could be the most difficult path and I also see educational value in
re-inventing the wheel for now, but I do understand this step may be
required to move to higher levels of analysis).
  - Look at sqlite's source code and try to implement analytical functions
in a way that leads to an optimization better than log(n^2) and contribute
my findings on this topic back to the community.

Thank you all very much for all your answers, they have been most useful.

Simon
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using SQL or C for data analysis?

2012-03-27 Thread Simon
> I think it is possible to create a custom aggregate function which would
> work on a cross-join of the data to analyse.  The cross-join makes all the
> data available to each bucket (group by Date, for example), and each
bucket
> is basically one row of the whole data.  The aggregate function would thus
> work on all the data, and output its result in every row of the result
> set.  The aggregate function would be in charge of dealing its own window
> (similar to the "varianceStep" function that ignores null values).

You're going to find a big distinction in what SQLite calls 'aggregate
> functions'.  The built-in ones are here:
>
> 
>
> and you can write your own very sophisticated ones.  No problem with that:
> good luck and have fun.  But there are some thing that can't be done within
> the grammar: functions which depend on the order of retrieved results, like
> some kinds of the 'moving_average()' function you mentioned in your
> original post.  This is because SQL tables have no inherent order: you can
> get all the values you want but you don't really know what order they'll
> show up in.  For that, you need your programming language, whatever it is.
>

I've been reading on how these were implemented in sqlite.  This is how I
had the idea to use a cross-join.  But after thinking about it in the metro
back home, I decided this was the least efficient approach.  Because sqlite
first assigns a series of rows to a bucket and then runs the aggregate
function on that bucket's rows, and since I intend to provide all rows to
all buckets (which represent all rows), then I get an optimization of
log(n^2) or worst depending on how I implement my own calculations.  The
only advantage to such a waste of cpu would be in being able to call it in
an sql query.  And this advantage is only viable if my intuition of this
whole project is worth it.

But thanks for reminding the importance of not relying on order!  It's
something I might have tried to force using twisted queries!

Simon
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using SQL or C for data analysis?

2012-03-27 Thread Simon Slavin

On 27 Mar 2012, at 9:47pm, Simon  wrote:

> But I don't think aggregates is the key here...  Basically, the kind of
> function I need is something like this:
>  For each row, in this column, calculate the foobar result on all (or a
> group of) the values of another column.
> I think this is the definition of windowing (I just read a few lines on the
> topic at the moment).
> 
> I think it is possible to create a custom aggregate function which would
> work on a cross-join of the data to analyse.  The cross-join makes all the
> data available to each bucket (group by Date, for example), and each bucket
> is basically one row of the whole data.  The aggregate function would thus
> work on all the data, and output its result in every row of the result
> set.  The aggregate function would be in charge of dealing its own window
> (similar to the "varianceStep" function that ignores null values).

You're going to find a big distinction in what SQLite calls 'aggregate 
functions'.  The built-in ones are here:



and you can write your own very sophisticated ones.  No problem with that: good 
luck and have fun.  But there are some thing that can't be done within the 
grammar: functions which depend on the order of retrieved results, like some 
kinds of the 'moving_average()' function you mentioned in your original post.  
This is because SQL tables have no inherent order: you can get all the values 
you want but you don't really know what order they'll show up in.  For that, 
you need your programming language, whatever it is.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using SQL or C for data analysis?

2012-03-27 Thread Simon
>
> A DBMS is a good way to keep your raw data.  But I highly doubt that a
> majority of your analysis algorithms are going to be expressible in SQL
> without going way beyond the intended purpose of the language.


I think you are right, but the cases where it can be expressed in SQL means
it can also be exploited by SQL and that's what I'm after.
What I mean, is simply being able to apply existing functions that work on
float values, and apply them to the result of other functions.  The
financial math is filled with these...  for example, one guy develops the
rate of change of the price of a stock (roc) which shows the basic
difference between price today and price X days ago.  Then another guy
decides to smooth the result through a moving average.  There is yet one
more guy to come in the future and divide the result by some other result,
and yet another guy in the future to smooth that through another moving
average, and so on.
I understand there are other ways outside SQL to do this.  I know.  I
believe I can code a moving average within 5-15 minutes, but that's not the
point, that's the last resort.


>  You will either find yourself limiting the analyses to what is convenient
> to express in SQL, or you will spend much more time writing queries than
> you would spend describing your data processing in a form more suited to
> functions.  SQL is primarily a language for extracting sets from other
> sets, according to defined criteria (which include set relationships). Your
> analyses, unless they are like nothing I've ever seen or imagined, are
> going to be derived from functions on time series rather than sets, per se.
>  I expect you would find a signal processing library, such as can be found
> in Matlab, Octave, or Scilab, to be a much better start than what you might
> write in SQL in reasonable time.
>

I will definitely look into it. Although I might appear to, I don't want to
close doors just because I have a strong desire to do things my way! ;)


> That said, it is not hard to imagine that selection of datasets might be
> done with SQL, and perhaps some query criteria might include measures
> derived from your own custom functions on time series.


As I mentionned in reply to petite-abeille, a mix of a cross-join and
custom aggregate functions might lead to the concept of analytical
functions.


> There is no real either/or choice posed here.  You can create your own
> custom functions to be incorporated into SQLite queries, including
> aggregate functions.  The aggregate functions might produce some output
> other than what is returned to SQLite.  What I think will be unreasonable
> or unduly limiting is making it happen in bare SQLite.  You will need more.
>

It comes back to your first statement "I highly doubt that a majority of
your analysis algorithms are going to be expressible in SQL [...]".  The
key is determining what needs to be in the core and what needs not.  And
perhaps, what simply cannot be...!  But I'm just getting started and I will
start my undergrad studies in maths in September, until then, I have plenty
of time to struggle with the basics! ;)

Thanks a lot for the information and guidance you provided,
  Simon
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using SQL or C for data analysis?

2012-03-27 Thread Simon
> Generally speaking, analytical functions (aka windowing functions [1])
> would appear to be the most useful for your endeavor.
>
> Sadly, SQLite doesn't provide anything like this out-of-the-box.
>

I wasn't aware of the term.  Thanks!  I'll be able to google on that now!
;)

And here are examples of custom functions, including stdev, variance, mode,
> median, lower_quartile, upper_quartile, etc:
>
> http://www.sqlite.org/contrib/download/extension-functions.c?get=25
>

This example file is seriously awesome!  I'll practice a few things, thanks!

But I don't think aggregates is the key here...  Basically, the kind of
function I need is something like this:
  For each row, in this column, calculate the foobar result on all (or a
group of) the values of another column.
I think this is the definition of windowing (I just read a few lines on the
topic at the moment).

I think it is possible to create a custom aggregate function which would
work on a cross-join of the data to analyse.  The cross-join makes all the
data available to each bucket (group by Date, for example), and each bucket
is basically one row of the whole data.  The aggregate function would thus
work on all the data, and output its result in every row of the result
set.  The aggregate function would be in charge of dealing its own window
(similar to the "varianceStep" function that ignores null values).

Although this seems a possible path, I don't feel it's a very appealing
path.  But I'll think about it and maybe I can come up with something.

Simon
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using SQL or C for data analysis?

2012-03-27 Thread Gabor Grothendieck
On Tue, Mar 27, 2012 at 3:02 PM, Simon  wrote:
> select closing_price, moving_average( funky_oscillator( closing_price ) )...

There is a moving average calculation in SQLite here but given the
complexity you might prefer to do the analytical portion in your
program:
http://code.google.com/p/sqldf/#Example_16._Moving_Average

It would be nice if sqlite had sql windowing functions to simplify
these sorts of calculations.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using SQL or C for data analysis?

2012-03-27 Thread Doug Currie

On Mar 27, 2012, at 3:46 PM, Larry Brasfield wrote:

> A DBMS is a good way to keep your raw data.  But I highly doubt that a 
> majority of your analysis algorithms are going to be expressible in SQL 
> without going way beyond the intended purpose of the language.  You will 
> either find yourself limiting the analyses to what is convenient to express 
> in SQL, or you will spend much more time writing queries than you would spend 
> describing your data processing in a form more suited to functions.  

Yes

> […]  I expect you would find a signal processing library, such as can be 
> found in Matlab, Octave, or Scilab, to be a much better start than what you 
> might write in SQL in reasonable time.

Or use a Statistical Computing language and environment such as R with SQLite

http://www.r-project.org/

http://cran.r-project.org/web/packages/RSQLite/index.html


e

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using SQL or C for data analysis?

2012-03-27 Thread Larry Brasfield

Hi there,
  I'm about to start a project I have been thinking about for a long
while.  I basically wish to analyse stock market data.  I already have the
data in a table and I'm now in the process of writing my own indicators and
oscillators.  I hope to learn while re-inventing this wheel and perhaps
explore different things unavailable in most stocks analysis software I
found.

  I would love to do an sql query that would look like the following ones.
I wonder if they are possible and valid applications for SQL and what would
be the proper implementation for these.  I know I can calculate all this
using C, but it would be most useful (to my later projects) to do it in SQL
directly (my last example gives you an idea where I'm going).

select max(opening_price, closing_price, high_price) - min(opening_price,
closing_price, low_price) as day_range...
select closing_price, moving_average(20,closing_price),
exp_mov_avg(20,closing_price)...
select closing_price, moving_average( funky_oscillator( closing_price ) )...

I think creating a module and using virtual tables may be an idea...  but
doing the above selects would involve some serious twists I think.  What
I'd need are virtual functions or something like that...


A DBMS is a good way to keep your raw data.  But I highly doubt that a 
majority of your analysis algorithms are going to be expressible in SQL 
without going way beyond the intended purpose of the language.  You will 
either find yourself limiting the analyses to what is convenient to 
express in SQL, or you will spend much more time writing queries than 
you would spend describing your data processing in a form more suited to 
functions.  SQL is primarily a language for extracting sets from other 
sets, according to defined criteria (which include set relationships). 
Your analyses, unless they are like nothing I've ever seen or imagined, 
are going to be derived from functions on time series rather than sets, 
per se.  I expect you would find a signal processing library, such as 
can be found in Matlab, Octave, or Scilab, to be a much better start 
than what you might write in SQL in reasonable time.


That said, it is not hard to imagine that selection of datasets might be 
done with SQL, and perhaps some query criteria might include measures 
derived from your own custom functions on time series.



Can you guys confirm with me whether I'm asking too much and I should
concentrate on a C or C++ implementation of my functions, or is there a way
to make it happen in sqlite?


There is no real either/or choice posed here.  You can create your own 
custom functions to be incorporated into SQLite queries, including 
aggregate functions.  The aggregate functions might produce some output 
other than what is returned to SQLite.  What I think will be 
unreasonable or unduly limiting is making it happen in bare SQLite.  You 
will need more.



Thanks,
  Simon

Have fun!
--
Larry Brasfield
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using SQL or C for data analysis?

2012-03-27 Thread Petite Abeille

On Mar 27, 2012, at 9:02 PM, Simon wrote:

>  I would love to do an sql query that would look like the following ones.
> I wonder if they are possible and valid applications for SQL and what would
> be the proper implementation for these.

Generally speaking, analytical functions (aka windowing functions [1]) would 
appear to be the most useful for your endeavor.

Sadly, SQLite doesn't provide anything like this out-of-the-box.

> I think creating a module ... may be an idea… 

You can always create you own (aggregate) functions:

http://www.sqlite.org/c3ref/create_function.html

Here is some inspiration:

SQL for Analysis and Reporting
http://docs.oracle.com/cd/B28359_01/server.111/b28313/analysis.htm

And here are examples of custom functions, including stdev, variance, mode, 
median, lower_quartile, upper_quartile, etc:

http://www.sqlite.org/contrib/download/extension-functions.c?get=25


[1] http://en.wikipedia.org/wiki/Window_function_(SQL)#Window_function
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users