Re: [GENERAL] Query generator

2016-10-16 Thread Vinicius Segalin
2016-10-16 8:54 GMT-02:00 Andreas Seltenreich <seltenre...@gmx.de>:

> Vinicius Segalin <vinisega...@gmail.com> writes:
>
> > 2016-09-29 16:32 GMT-03:00 Julien Rouhaud <julien.rouh...@dalibo.com>:
> >
> > >  You should try sqlsmith (https://github.com/anse1/sqlsmith), which
> works
> > >  very well.
> >
> > I had found this one before, but all I could get was queries using
> > "standard" tables, like the ones from pg_catalog and
> > information_schema. It didn't generate queries from the tables I've
> > created. Was I doing something wrong?
>
> It should pick your relations with the same likelihood it picks the
> catalog tables.  You could verify this for with, e.g.,
>
> sqlsmith --dry-run | grep your_table_name
>
> Thanks to Julien, sqlsmith's master branch now has an option
> --exclude-catalog that inhibits use of catalog relations.  If you are
> building from github, you might want to give it another chance with this
> option.  I'll probably do a proper release in next two weeks.
>
> Feedback welcome :-)
>
> regards,
> Andreas
>

Yes, I was talking in private with him and he came up with this solution.
Thank you again, Julien.
I need to try it again, did a quick test on friday, but it looks like it
works perfectly! :)


Re: [GENERAL] Query generator

2016-09-29 Thread Vinicius Segalin
2016-09-29 16:32 GMT-03:00 Julien Rouhaud <julien.rouh...@dalibo.com>:

> On 29/09/2016 21:27, Vinicius Segalin wrote:
> > Hi everyone,
> >
>
> Hello,
>
> > Does anyone know a random query generator for Postgres? Something that
> > gets my schema and, based on that, generates hundreds of different
> > queries with join, group by, etc.
> >
>
> You should try sqlsmith (https://github.com/anse1/sqlsmith), which works
> very well.


I had found this one before, but all I could get was queries using
"standard" tables, like the ones from pg_catalog and information_schema. It
didn't generate queries from the tables I've created. Was I doing something
wrong?


[GENERAL] Query generator

2016-09-29 Thread Vinicius Segalin
Hi everyone,

Does anyone know a random query generator for Postgres? Something that gets
my schema and, based on that, generates hundreds of different queries with
join, group by, etc.

Thank you.


Re: [GENERAL] Predicting query runtime

2016-09-13 Thread Vinicius Segalin
2016-09-13 17:12 GMT-03:00 Oleg Bartunov :
>
>
> FYI, please check https://pgconf.ru/2016/89977
>
>
 Interesting! Was this presentation filmed? Or would you have a post or
something else with more details?


Re: [GENERAL] Predicting query runtime

2016-09-12 Thread Vinicius Segalin
2016-09-12 18:22 GMT-03:00 Istvan Soos :

> Hi Vinicius,
>
> At Heap we have non-trivial complexity in our analytical queries, and
> some of them can take a long time to complete. We did analyze features
> like the query planner's output, our query properties (type,
> parameters, complexity) and tried to automatically identify factors
> that contribute the most into the total query time. It turns out that
> you don't need to use machine learning for the basics, but at this
> point we were not aiming for predictions yet.
>

And how did you do that? Manually analyzing some queries?
But I think, as you said, it wouldn't apply for predictions, but instead
for making long queries run faster, right?


>
> As a spoiler: queries take long time because they do a lot of IO.
> Features like reachback depth and duration (e.g. what period is the
> analytical query about) can contribute a lot to the amount of IO,
> thus, the query time. I have a blog post in my queue about our
> analysis, would gladly bump its priority if there is interest in such
> details.
>

If it's not too much work, I would like to have some details on your
process. It looks it's not exactly what I'm trying to do, but would
certainly help me with my work.


> I'm also curious: if you had a great way to predict the time/cost of
> the queries, how would you use it?


I'm working on something for my master degree (it's the idea, and I really
hope I can make it possible) where I'll help the user choosing the
resources for the database that will give him the best performance (or at
least the performance he thinks it's good enough). So the idea would be to
train each machine (with different resources) and then be able to predict
for an specific query what the performance would be.

Thank you all for the answers so far. I hope we can clear my mind about
this issue.


Re: [GENERAL] Predicting query runtime

2016-09-12 Thread Vinicius Segalin
2016-09-12 17:01 GMT-03:00 Jeff Janes <jeff.ja...@gmail.com>:

> On Mon, Sep 12, 2016 at 7:03 AM, Vinicius Segalin <vinisega...@gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> I'm trying to find a way to predict query runtime (I don't need to be
>> extremely precise). I've been reading some papers about it, and people are
>> using machine learning to do so. For the feature vector, they use what the
>> DBMS's query planner provide, such as operators and their cost. The thing
>> is that I haven't found any work using PostgreSQL, so I'm struggling to
>> adapt it.
>> My question is if anyone is aware of a work that uses machine learning
>> and PostgreSQL to predict query runtime, or maybe some other method to
>> perform this.
>>
>
> I don't know about machine learning, but if there were some way to get the
> planner to tell you predicted cost in terms of a breakdown of how many
> multiples of each *_cost factor (rather than only a grand total which is
> what it does now), then it would be fairly easy to combine that with wall
> times from log_duration and do a simple linear regression.
>
> I suspect the result would be that seq_page_cost and random_page_cost
> would have huge uncertainties on them.  And since pretty much every query
> has non-zero predicted values for at least one of those, the huge
> uncertainties would then pollute all the rest of the fitted values as
> well.  Perhaps that is where the machine learning would come in?
>
> Another issue is the predicted costs are only meant to choose between
> different plans, not to predict overall wall time. Some parts of the
> planner only have one way to do something, and so doesn't bother to compute
> a cost for that as there is no choice to be made.  This would leave glaring
> holes in the estimates (particularly for updates)
>
> But to get that data out would require quite a bit of tedious altering of
> the planner code, and then you would have to find people willing to run
> that altered code on real world databases with a high level of logging to
> gather the data.  (I suspect that gathering data from only toy databases
> would not be very useful).
>
> Cheers,
>
> Jeff
>

Modifying the planner is way too complex for me at this time, so I really
can't go into that kind of solution, but I can try to use as much as the
planner gives me today, make the best out of it and hope it's enough to
give me some satisfactory results.


Re: [GENERAL] Predicting query runtime

2016-09-12 Thread Vinicius Segalin
2016-09-12 15:16 GMT-03:00 Merlin Moncure <mmonc...@gmail.com>:

> On Mon, Sep 12, 2016 at 9:03 AM, Vinicius Segalin <vinisega...@gmail.com>
> wrote:
> > Hi everyone,
> >
> > I'm trying to find a way to predict query runtime (I don't need to be
> > extremely precise). I've been reading some papers about it, and people
> are
> > using machine learning to do so. For the feature vector, they use what
> the
> > DBMS's query planner provide, such as operators and their cost. The
> thing is
> > that I haven't found any work using PostgreSQL, so I'm struggling to
> adapt
> > it.
> > My question is if anyone is aware of a work that uses machine learning
> and
> > PostgreSQL to predict query runtime, or maybe some other method to
> perform
> > this.
>
> Well, postgres estimates the query runtime in the form of an expected
> 'cost', where the cost is an arbitrary measure based on time
> complexity of query plan.   It shouldn't be too difficult to correlate
> estimated cost to runtime cost.


That's what I though too. At least it makes sense, I guess. But sometimes
logic doesn't work, so I think only giving it a try will say.


> A statistical analysis of that
> correlation would be incredibly useful work although generating sample
> datasets would be a major challenge.
>
> merlin
>

Indeed. I'm using TPC-B along with pgbench to have some data to test (while
I don't have real data), but I'm having a hard time creating queries that
give me (very) different performance results so I can train my ML algorithm.


Re: [GENERAL] Predicting query runtime

2016-09-12 Thread Vinicius Segalin
2016-09-12 12:08 GMT-03:00 Oleksandr Shulgin <oleksandr.shul...@zalando.de>:

> On Mon, Sep 12, 2016 at 4:03 PM, Vinicius Segalin <vinisega...@gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> I'm trying to find a way to predict query runtime (I don't need to be
>> extremely precise). I've been reading some papers about it, and people are
>> using machine learning to do so. For the feature vector, they use what the
>> DBMS's query planner provide, such as operators and their cost. The thing
>> is that I haven't found any work using PostgreSQL, so I'm struggling to
>> adapt it.
>> My question is if anyone is aware of a work that uses machine learning
>> and PostgreSQL to predict query runtime, or maybe some other method to
>> perform this.
>>
>
> Hi,
>
> I'm not aware of machine-learning techniques to achieve that (and I don't
> actually believe it's feasible), but there you might find this extension
> particularly useful: https://www.postgresql.org/docs/9.5/static/
> pgstatstatements.html
>

I'll read about it and see how it can help me. Thank you


> Can you share some links to the papers you are referring to (assuming
> these are publicly available)?
>

I don't have them all now, but these are some of them:

http://dl.acm.org/citation.cfm?id=1845166
https://people.eecs.berkeley.edu/~jordan/papers/ganapathi-etal-icde09.pdf


[GENERAL] Predicting query runtime

2016-09-12 Thread Vinicius Segalin
Hi everyone,

I'm trying to find a way to predict query runtime (I don't need to be
extremely precise). I've been reading some papers about it, and people are
using machine learning to do so. For the feature vector, they use what the
DBMS's query planner provide, such as operators and their cost. The thing
is that I haven't found any work using PostgreSQL, so I'm struggling to
adapt it.
My question is if anyone is aware of a work that uses machine learning and
PostgreSQL to predict query runtime, or maybe some other method to perform
this.

Thank you.

Best regards,

Vinicius Segalin