from:"Samuel Gendler"

Re: [SQL] [noob] How to optimize this double pivot query?

2012-10-03 Thread Samuel Gendler

On Wed, Oct 3, 2012 at 9:31 AM, Robert Buck  wrote:

> Samuel,
>
> You asked:
>
> Some questions before I provide crosstab samples - your example query has
> a hardcoded set of keys that it is looking for, but maybe that was
> code-generated.  There are multiple forms of the crosstab function, some of
> which rely on a constant column count and others which can generate the set
> of columns based on another query.  There are complications in that second
> form relating to race conditions with regard to new keys showing up between
> the query for the columns and the query for the data, so it is important to
> understand that and either structure queries accordingly or make sure you
> execute in a transaction with sufficient transaction isolation to prevent
> the race condition from appearing.
>
> The query was hand written from examples I found on the web. The list here
> was hand written too. This will run in Ruby, so I can template as much as
> needed on that side. I can also run this in a defined transaction boundary
> rather than using auto-commit. Right now I am thumbing through your email
> and trying it out.
>

Actually, it isn't sufficient just to run within a defined transaction,
since the default isolation level of a transaction in postgresql doesn't
guarantee that you'll see the same rows in two queries during the same
transaction.  If another transaction commits between your queries, you'll
see the new rows in the 2nd query.  You need to have your transaction
isolation set to 'serializable' in order to prevent phantom reads.  The
default isolation level is 'read committed' which says you'll only see rows
that have been committed, but there's no guarantee that two queries will
get the same values if another transaction commits a change between them.

For details, see this:

http://www.postgresql.org/docs/9.1/static/transaction-iso.html

*In general, I find that it is easier to set up my crosstab processing to
generate consistent column sets by including a where clause in all but the
first query such that the where clause always matches the first
query.* That won't work if column categories can be deleted, since
you'll still
get a different column set from one query to the next, but the vast
majority of my apps don't allow actual row deletion, or it is at least
incredibly rare and only done by administrative users, not end-users of the
app.  Generally, rows representing domain entities get marked as disabled,
so tows going away isn't usually a factor for me.  And it allows me to not
have to worry that someone who comes along and re-uses my code won't set
their transaction isolation up correctly when running the crosstab query
and will get whacky results.

Note that if you do use serializable transactions to do this, you should
use a read-only transaction to cut down on locking overhead and the
possibility that other transactions will fail due to serialization errors
while your crosstab is running.

This all seems terribly complex when describing it, but it really is pretty
easy once you've done it once.  The column-count/serializable thing applies
to using array_agg to build a crosstab, too, since you still need to ensure
that your arrays all have the same number of entries in the same order.

--sam

Re: [SQL] [noob] How to optimize this double pivot query?

2012-10-02 Thread Samuel Gendler

One last comment - an alternative solution to this is to use array_agg(),
which will simply add each value that matches a group by clause to an
array.  If you are careful to structure your query with left joins so that
every grouping will have the same number of rows, then you can get a quick
and dirty crosstab like this

select row_name, array_agg(value) from test t left join value_table v on
t.id = v.id group by 1;

Obviously, you'll want to deal with your potential for duplicate rows via
max() or avg() in a subquery, rather than joining directly to the table,
but you should get the idea from that example.  You can also use coalesce
to convert nulls to some other value, if required.

Since the crosstab functions already require you to do all that work with
regard to determining column names and building up the record structure,
using array_agg can be every bit as effective, since it basically requires
the same process.  First query for all possible names, then issue a query
that will cause the values to be processed by array_agg in column order,
then iterate over results, getting each array value and associating it with
a particular name.

Your result will look  like this:

id_name, start_time, end_time, array_of_values

That may or may not be convenient for you, depending upon how you are using
the resultset you get back. You'll still need to play all the same games
with regard to unioning multiple queries together to pivot data from
multiple tables into the same row.



On Tue, Oct 2, 2012 at 12:57 PM, Samuel Gendler
wrote:

>
>
> On Tue, Oct 2, 2012 at 2:45 AM, Robert Buck wrote:
>
>> Hi Samuel
>>
>> Thank you. This may be a bit of a stretch for you, but would it be
>> possible for me to peek at a sanitized version of your cross tab query, for
>> a good example on how to do this for this noob?
>>
>> This will be pretty common in my case. The biggest tables will get much
>> larger as they are raw metrics feeds, which at some point need to be fed
>> through reporting engines to analyze and spot regressions.
>>
>> Lastly, am I simply using the wrong tech for data feeds and analytics?
>> The first cut of this used flat files and R and though it scoured thousands
>> of files was much faster than the SQL I wrote here. The big goal was to get
>> this off disk and into a database, but as its highly variable, very sparse,
>> metric data, this is why I chose k-v. SQL databases are internally more
>> politically acceptable, though I am personally agnostic on the matter. In
>> the end it would be nice to directly report off a database, but so long as
>> I can transform to csv I can always perform reporting and analytics in R,
>> and optionally map and reduce natively in Ruby. Sane? Ideas? This is early
>> on, and willing to adjust course and find a better way if suggestions
>> indicate such. I've heard a couple options so far.
>>
>
> OK, you owe me.  I think I just spent more than an hour writing this up
> ;-)
>
> Given the numbers of rows you are talking about, I can't think of any good
> reason why the database shouldn't do what you need it to do pretty
> effectively/quickly.
>
> Some questions before I provide crosstab samples - your example query has
> a hardcoded set of keys that it is looking for, but maybe that was
> code-generated.  There are multiple forms of the crosstab function, some of
> which rely on a constant column count and others which can generate the set
> of columns based on another query.  There are complications in that second
> form relating to race conditions with regard to new keys showing up between
> the query for the columns and the query for the data, so it is important to
> understand that and either structure queries accordingly or make sure you
> execute in a transaction with sufficient transaction isolation to prevent
> the race condition from appearing.
>
> For crosstab queries, you generally want a query that returns results in
> the following form:
>
> row_name | key | value
>
> where row_name would be a date or some other label that all values that
> should share a row will have in common.  Key is the field that defines a
> column in the final row.  And value is the value to go in that column.  If
> it is the case that you always want ALL keys, then you can simply do a left
> join to ensure that you get a row for every key, regardless of whether
> there is an actual value.  You can use COALESCE to turn nulls resulting
> from left joins that don't match into '0' values.  If there is the
> potential for multiple rows with the same key and row_name, you need to use
> an aggregate function (like max or avg) to compress all rows into a single
> row.
>
> The crosstab(text sql) form

Re: [SQL] [noob] How to optimize this double pivot query?

2012-10-02 Thread Samuel Gendler

On Tue, Oct 2, 2012 at 2:45 AM, Robert Buck  wrote:

> Hi Samuel
>
> Thank you. This may be a bit of a stretch for you, but would it be
> possible for me to peek at a sanitized version of your cross tab query, for
> a good example on how to do this for this noob?
>
> This will be pretty common in my case. The biggest tables will get much
> larger as they are raw metrics feeds, which at some point need to be fed
> through reporting engines to analyze and spot regressions.
>
> Lastly, am I simply using the wrong tech for data feeds and analytics? The
> first cut of this used flat files and R and though it scoured thousands of
> files was much faster than the SQL I wrote here. The big goal was to get
> this off disk and into a database, but as its highly variable, very sparse,
> metric data, this is why I chose k-v. SQL databases are internally more
> politically acceptable, though I am personally agnostic on the matter. In
> the end it would be nice to directly report off a database, but so long as
> I can transform to csv I can always perform reporting and analytics in R,
> and optionally map and reduce natively in Ruby. Sane? Ideas? This is early
> on, and willing to adjust course and find a better way if suggestions
> indicate such. I've heard a couple options so far.
>

OK, you owe me.  I think I just spent more than an hour writing this up ;-)

Given the numbers of rows you are talking about, I can't think of any good
reason why the database shouldn't do what you need it to do pretty
effectively/quickly.

Some questions before I provide crosstab samples - your example query has a
hardcoded set of keys that it is looking for, but maybe that was
code-generated.  There are multiple forms of the crosstab function, some of
which rely on a constant column count and others which can generate the set
of columns based on another query.  There are complications in that second
form relating to race conditions with regard to new keys showing up between
the query for the columns and the query for the data, so it is important to
understand that and either structure queries accordingly or make sure you
execute in a transaction with sufficient transaction isolation to prevent
the race condition from appearing.

For crosstab queries, you generally want a query that returns results in
the following form:

row_name | key | value

where row_name would be a date or some other label that all values that
should share a row will have in common.  Key is the field that defines a
column in the final row.  And value is the value to go in that column.  If
it is the case that you always want ALL keys, then you can simply do a left
join to ensure that you get a row for every key, regardless of whether
there is an actual value.  You can use COALESCE to turn nulls resulting
from left joins that don't match into '0' values.  If there is the
potential for multiple rows with the same key and row_name, you need to use
an aggregate function (like max or avg) to compress all rows into a single
row.

The crosstab(text sql) form of the function requires that it gets a value
for every possible column in every single row.  It makes some naive
assumptions about the presence of a value for every spot in the matrix.  It
also cannot include extra columns as plain payload in the final cross tab
result.  For greater flexibility, you really want to use the crosstab(text
sql, text sql) form of the function.  This allows you to specify the set of
columns that should appear via the 2nd query.  When processing the results,
it makes fewer naive assumptions about the presence of data for every
column, so it correctly handles missing data (and, I assume, data for
columns that didn't appear in the 2nd query).  This eliminates the need for
left joins to get empty rows, though you do still need to use aggregates to
combine multiple rows into one row.

Your 2nd query would be a query that simply returns the distinct set of key
names from both of your key/value tables.  You want them in a consistent
and repeatable order, so order by id_name:

select id_name from (
select distinct m.id_name from metadata_key m
union
select distinct d.id_name from metric_def d
) q order by id_name

The nested union query is required based on what I can gleam of your schema
from your examples, since you are getting keys from multiple tables and we
need them all in a single column in a consistent order.

Now you just need a query that returns

row_name | extra_col1 | extra_col2 | key | max(value)

extra_col1 and extra_col2 are going to be the begin_time and end_time from
your test table, but you are going to have to do the union thing again in
order to extract results from multiple tables into a single column, and
you'll have to do the subquery thing in order to get the rows in consistent
order again:

select row_name, begin_time, end_time, category, value from
(
select t.id_name as row_name, max(t.begin_time) as begin_time,
max(t.end_time) as end_time,
  m.id_nam

Re: [SQL] [noob] How to optimize this double pivot query?

2012-10-02 Thread Samuel Gendler

On Mon, Oct 1, 2012 at 11:46 PM, Thomas Kellerer  wrote:

>
> That combined with the tablefunc module (which let's you do pivot queries)
> might
> make your queries substantially more readable (and maybe faster as well).
>
>
I woud think that using the crosstab functions in tablefunc would solve the
problem without needing a complete change of structure. I've built
crosstabs over a whole lot more than 54K rows in far, far less time (and
resulting in more than 35 columns, too) than the 11 seconds that was quoted
here, without feeling the need to deal with hstore or similar.  In fact,
wouldn't hstore actually make it more difficult to build a crosstab query
than the schema that he has in place now?

--sam

Re: [SQL] Need help with a special JOIN

2012-09-29 Thread Samuel Gendler

On Sat, Sep 29, 2012 at 9:02 AM, Andreas  wrote:

> Hi,
>
> asume I've got 2 tables
>
> objects ( id int, name text )
> attributes ( object_id int, value int )
>
> attributes   has a default entry with object_id = 0 and some other where
> another value should be used.
>
> e.g.
> objects
> (   1,   'A'   ),
> (   2,   'B'   ),
> (   3,   'C'   )
>
> attributes
> (   0,   42   ),
> (   2,   99   )
>
> The result of the join should look like this:
>
> object_id, name, value
> 1,   'A',   42
> 2,   'B',   99
> 3,   'C',   42
>
>
> I could figure something out with 2 JOINs, UNION and some DISTINCT ON but
> this would make my real query rather chunky.   :(
>
> Is there an elegant way to get this?
>
>
I'm not sure it is any more elegant than the kind of solution you suggest,
but this works:

# select id, name, value from
(select *, count(o.id) over (partition by o.id) as total from objects o
join attributes a on a.object_id = o.id or a.object_id = 0) q
where total = 1 or object_id != 0;
 id | name | value
+--+---
  1 | A|42
  2 | B|99
  3 | C|42

Re: [SQL] ORDER BY COLUMN_A, (COLUMN_B or COLUMN_C), COLUMN_D

2012-09-12 Thread Samuel Gendler

you put a conditional clause in the order by statement, either by
referencing a column that is populated conditionally, like this

select A, when B < C Then B else C end as condColumn, B, C, D
from ...
where ...
order by 1,2, 5

or

select A, when B < C Then B else C end as condColumn, B, C, D
from ...
where ...
order by A,condColumn, D

or you can just put the conditional statement in the order by clause (which
surprised me, but I tested it)

select A, B, C, D
from ...
where ...
order by A,when B < C then B else C end, D



On Wed, Sep 12, 2012 at 2:44 PM, Rodrigo Rosenfeld Rosas  wrote:

> This is my first message in this list :)
>
> I need to be able to sort a query by column A, then B or C (which one
> is smaller, both are of the same type and table but on different left
> joins) and then by D.
>
> How can I do that?
>
> Thanks in advance,
> Rodrigo.
>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>

Re: [SQL] weird join producing too many rows

2012-09-12 Thread Samuel Gendler

I'll admit I don't see any reason why you should get duplicate rows based
on the data you've provided, but I am wondering why you are using the
subquery instead of just 'where r.r_id = 5695'

select p.p_id, r.pr_ind
from pieces p
join pieces_requests r on p.p_id = r.p_id
where r.r_id = 5695

Though I'll be the first to admit that that seems to me like it ought to
return the exact same rows as both your queries.  Are you sure you don't
have multiple rows in pieces_requests with the same p_id, r_id pairing?
 Your join must be resulting in multiple rows for each p_id somehow.

On Tue, Sep 11, 2012 at 7:42 AM, Gary Stainburn <
gary.stainb...@ringways.co.uk> wrote:

> I have a pieces table with p_id as primary key.
> I have a requests table with r_id as primary key.
> I have a pieces_requests table with (p_id, r_id) as primary key, and an
> indicator pr_ind reflecting the state of that relationship
>
> A single select of details from the pieces table based on an entry in the
> pieces_requests table returns what I expect.
>
> users=# select * from pieces_requests where r_id=5695;
>  p_id | r_id | pr_ind
> --+--+
>  5102 | 5695 |
>  5020 | 5695 |
>  5065 | 5695 |
>  5147 | 5695 |
>  4917 | 5695 |
>  5165 | 5695 |
>  4884 | 5695 |
>  5021 | 5695 |
>  5121 | 5695 |
>  5130 | 5695 |
>  5088 | 5695 |
>  4900 | 5695 |
>  4197 | 5695 |
>  2731 | 5695 |
> (14 rows)
>
> users=# select p_id, p_name from pieces where p_id in (select p_id from
> pieces_requests where r_id=5695);
>  p_id | p_name
> --+-
>  4884 | LSERVB
>  4900 | ESALES4
>  5102 | LSALES6
>  2731 | LSALESE
>  5147 | ESALES5
>  5020 | LSALES5
>  5130 | LSALES3
>  5021 | WSERV7
>  4917 | LSALESA
>  5165 | LSERV8
>  5088 | LADMIN1
>  5121 | LSALESL
>  4197 | WSERV1
>  5065 | LSALESG
> (14 rows)
>
> users=#
>
>
> However, when I try to include the pr_ind in the result set I get multiple
> records (at the moment pr_ind is NULL for every record)
>
> I've tried both
>
> select p.p_id, r.pr_ind
> from pieces p
> join pieces_requests r on p.p_id = r.p_id
> where p.p_id in (select p_id from pieces_requests where r_id=5695)
>
> and
>
> select p.p_id, r.pr_ind
> from pieces p, pieces_requests r
> where p.p_id = r.p_id and
> p.p_id in (select p_id from pieces_requests where r_id=5695)
>
> Both result in the following. Can anyone see why. I think I'm going blind
> on
> this one
>
> users=# select p.p_id, p_name, r.pr_ind
> users-# from pieces p, pieces_requests r
> users-# where p.p_id = r.p_id and
> users-# p.p_id in (select p_id from pieces_requests where r_id=5695);
>  p_id | p_name  | pr_ind
> --+-+
>  2731 | LSALESE |
>  2731 | LSALESE |
>  2731 | LSALESE |
>  2731 | LSALESE |
>  4884 | LSERVB  |
>  4900 | ESALES4 |
>  4900 | ESALES4 |
>  4917 | LSALESA |
>  4197 | WSERV1  |
>  4197 | WSERV1  |
>  4884 | LSERVB  |
>  5021 | WSERV7  |
>  5065 | LSALESG |
>  5065 | LSALESG |
>  4884 | LSERVB  |
>  5121 | LSALESL |
>  5088 | LADMIN1 |
>  5130 | LSALES3 |
>  5147 | ESALES5 |
>  5102 | LSALES6 |
>  5020 | LSALES5 |
>  5065 | LSALESG |
>  5147 | ESALES5 |
>  4917 | LSALESA |
>  5165 | LSERV8  |
>  4884 | LSERVB  |
>  5021 | WSERV7  |
>  5121 | LSALESL |
>  5130 | LSALES3 |
>  5088 | LADMIN1 |
>  4900 | ESALES4 |
>  4197 | WSERV1  |
>  2731 | LSALESE |
> (33 rows)
>
> users=#
>
> --
> Gary Stainburn
> Group I.T. Manager
> Ringways Garages
> http://www.ringways.co.uk
>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>

[SQL] prepared statement in crosstab query

2012-08-31 Thread Samuel Gendler

I have the following crosstab query, which needs to be parameterized in the
2 inner queries:

SELECT * FROM crosstab(
$$
SELECT t.local_key,
   s.sensor_pk,
   CASE WHEN t.local_day_abbreviation IN (?,?,?,?,?,?,?) THEN
q.dpoint_value
   ELSE NULL
   END as dpoint_value
FROM dimensions.sensor s
INNER JOIN dimensions.time_ny t
ON s.building_id = ?
AND s.sensor_pk IN
(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
AND t.local_key BETWEEN ? AND ?
LEFT OUTER JOIN (
SELECT f.time_fk, f.sensor_fk,
   cast(avg(f.dpoint_value) as numeric(10,2)) as dpoint_value
FROM facts.bldg_4_thermal_fact f
WHERE f.time_fk BETWEEN ? AND ?
  AND f.sensor_fk IN
(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
GROUP BY 1,2) q
ON q.time_fk = t.local_key
AND q.sensor_fk = s.sensor_pk
ORDER BY 1,2
$$,
$$
SELECT s.sensor_pk
FROM dimensions.sensor s
WHERE s.building_id = ?
  AND s.sensor_pk IN
(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
ORDER BY 1
$$
) q(time_key bigint, a4052 real,a4053 real,a4054 real,a4055 real,a4056
real,a4057 real,a4058 real,a4059 real,a4060 real,a4061 real,a4062
real,a4063 real,a4064 real,a4065 real,a4066 real,a4067 real,a4068
real,a4069 real,a4070 real,a4071 real,a4072 real,a4073 real,a4074
real,a4075 real,a4076 real,a4077 real,a4078 real,a4079 real)




However, when I attempt to create a prepared statement in java (or groovy,
or as a hibernate sqlQuery object) with the following set of parameters
(the counts do match), I always get an exception telling me the following




[Mon, Tue, Wed, Thu, Fri, Sat, Sun, 4, 4052, 4053, 4054, 4055, 4056, 4057,
4058, 4059, 4060, 4061, 4062, 4063, 4064, 4065, 4066, 4067, 4068, 4069,
4070, 4071, 4072, 4073, 4074, 4075, 4076, 4077, 4078, 4079, 20120402,
20120404, 20120402, 20120404, 4052, 4053, 4054, 4055, 4056,
4057, 4058, 4059, 4060, 4061, 4062, 4063, 4064, 4065, 4066, 4067, 4068,
4069, 4070, 4071, 4072, 4073, 4074, 4075, 4076, 4077, 4078, 4079, 4, 4052,
4053, 4054, 4055, 4056, 4057, 4058, 4059, 4060, 4061, 4062, 4063, 4064,
4065, 4066, 4067, 4068, 4069, 4070, 4071, 4072, 4073, 4074, 4075, 4076,
4077, 4078, 4079]

Caused by: org.postgresql.util.PSQLException: The column index is out of
range: 1, number of columns: 0.
at
org.postgresql.core.v3.SimpleParameterList.bind(SimpleParameterList.java:53)
at
org.postgresql.core.v3.SimpleParameterList.setStringParameter(SimpleParameterList.java:118)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.bindString(AbstractJdbc2Statement.java:2184)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.setString(AbstractJdbc2Statement.java:1303)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.setString(AbstractJdbc2Statement.java:1289)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.setObject(AbstractJdbc2Statement.java:1763)
at
org.postgresql.jdbc3g.AbstractJdbc3gStatement.setObject(AbstractJdbc3gStatement.java:37)
at
org.postgresql.jdbc4.AbstractJdbc4Statement.setObject(AbstractJdbc4Statement.java:46)
at
org.apache.commons.dbcp.DelegatingPreparedStatement.setObject(DelegatingPreparedStatement.java:169)
at
org.apache.commons.dbcp.DelegatingPreparedStatement.setObject(DelegatingPreparedStatement.java:169)




I've tried a number of different escaping mechanisms but I can't get
anything to work.  I'm starting to think that postgresql won't allow me to
use do parameter replacement in the inner queries.  Is this true? The query
runs just fine if I manually construct the string, but some of those params
are user input so I really don't want to just construct a string if I can
avoid it.

Any suggestions?

Or can I create a prepared statement and then pass it in as a param to
another prepared statement?

Something like:

SELECT * FROM crosstab(?, ?) q(time_key bigint, a4052 real,a4053 real,a4054
real,a4055 real,a4056 real,a4057 real,a4058 real,a4059 real,a4060
real,a4061 real,a4062 real,a4063 real,a4064 real,a4065 real,a4066
real,a4067 real,a4068 real,a4069 real,a4070 real,a4071 real,a4072
real,a4073 real,a4074 real,a4075 real,a4076 real,a4077 real,a4078
real,a4079 real)

With each '?' being passed a prepared statement?  That'd be a really cool
way to handle it, but it seems unlikely to work.

Doing the whole thing in a stored proc isn't really easily done - at least
with my limited knowledge of creating stored procs, since all of the lists
are of varying lengths, as are the number of returned columns (which always
matches the length of the last 3 lists plus 1.

[SQL] query structure for selecting row by tags

2012-08-01 Thread Samuel Gendler

I need to tag entities in my db with arbitrary strings.  That part is
simple enough. The difficulty is that I need to select entities based on
whether they match an expression built via boolean combinations of tags:

(tag1 || tag2 || tag3) && tag4

I can roll my own tagging mechanism, in which case it'd probably look like
this:

create table entity (
id bigserial not null primary key,
...
)

create table entity_tags (
entity_fk bigint not null,
tag varchar
)

Or I can use a generic tagging plugin in my webapp framework (grails,
taggable plugin, hibernate under the hood), in which case the table
structure looks like this:

create table entity (
id bigserial not null primary key,

)

create table tag (
id bigserial,
tag varchar
)

create table taglink (
entity_ref bigint,
entity_type varchar, -- stores the type of entity since this table is
used for all entities in the system
tag_ref bigint
)

In the grails-native version, there is no explicit foreign key to a single
entity table, since entity_ref is used as a foreign key into as many entity
tables as there are taggable entities, but within any given query, its role
is to act as a foreign key to the table specified in entity_type.  tag_ref
is always a foreign key to the tags table, which does eliminate the
replication of the tag string in every row, but given that most tag names
are likely to be pretty short, it probably doesn't save much.

I don't see a very strong reason to support one version over the other, as
the problem of constructing a query that can resolve entity_ids based on
the expression given at the top of this message seems to have similar
difficulty in either context.  My question is, is there a sneaky way to
turn my expression into a query other than the following:

For every tag element in the expression, join to the tag table one time
(via taglink, if necessary), and then express a where condition that looks
similar to the expression.  If each tag table join is aliased as t1, t2,
t3, ..., the where clause would look like this:

select distinct e.id
from entity e
join entity_tags t1 on t1.entity_fk = e.id
join entity_tags t2 on t2.entity_fk = e.id
join entity_tags t3 on t3.entity_fk = e.id
join entity_tags t4 on t4.entity_fk = e.id
where ((t1.tag = 'tag1' or t2.tag = 'tag2' or t3.tag = 'tag3') and t4.tag =
'tag4') and [rest of where clause goes here]

This has the downside of creating a huge cross multiple of the entity_tags
table with itself, though I could keep it from getting out of hand by
limiting the join to only those tags that are used in the expression -

join entity_tags t1 on t1.entity_fk = e.id and t1.tag in ('tag1', 'tag2',
'tag3', tag4')
join entity_tags t2 on t2.entity_fk = e.id and t2.tag in ('tag1', 'tag2',
'tag3', tag4')
join entity_tags t3 on t3.entity_fk = e.id and t3.tag in ('tag1', 'tag2',
'tag3', tag4')
join entity_tags t4 on t4.entity_fk = e.id and t4.tag in ('tag1', 'tag2',
'tag3', tag4')

There is no reason for a tag expression to ever include more than 10 tag
elements, so I'd have no trouble limiting the number of tags allowed in an
expression to a low number like that in my expression parser.

But maybe there is a better mechanism, one which aggregates all tags into
an array and then builds a where clause that looks for values in a single
array column?  In my app, there will be a far higher number of entities
(milions) than tags (hundreds), with many tags shared amongst many
entities, but no entity having more than maybe 5-10 tags (and no entity
having less than 4 tags).  An array mechanism would look like this:

select e.id
from entity e
join entity_tags t on t.entity_fk = e.id and t1.tag in ('tag1', 'tag2',
'tag3', 'tag4')
group by e.id
having ((array_agg(t.tag) && ARRAY['tag1','tag2','tag3']::varchar[]) and
array_agg(t.tag) @> ARRAY['tag4'])

though simplifying the having clause that way would be difficult, so it
would probably be more like this:
having ((array_agg(t.tag) @> ARRAY['tag1']::varchar[] or
array_agg(t.tag) @> ARRAY['tag2']::varchar[] or
array_agg(t.tag) @> ARRAY['tag3']::varchar[]) and
array_agg(t.tag) @> ARRAY['tag4']::varchar[])

Hopefully, the query planer would simplify that?

Additionally, there are actually 4 columns of the entity that I want to
treat as tags in the context of a query that selects based on tag
expression.  My intent was to just create a tag that contains the same
value as each of the 4 columns and deal with the maintenance of ensuring
that whenever the columns are updated, the tags assigned to the entity
reflect the change.  This spares me from having to write a where condition
that checks tag column and 4 separate entity columns for each expression
element.  However, maybe it is possible to roll the 4 columns up into the
array of tag values somehow, eliminating the requirement of storing the
string values both in the tags table and in the columns of the entity
table.  That would be a big normalization/maintenance win.

I'm ope

Re: [SQL] How to solve the old bool attributes vs pivoting issue?

2012-06-27 Thread Samuel Gendler

On Wed, Jun 27, 2012 at 7:26 PM, David Johnston  wrote:

> On Jun 27, 2012, at 21:07, Andreas  wrote:
>
>
> You should look and see whether the hstore contrib module will meet your
> needs.
>
> http://www.postgresql.org/docs/9.1/interactive/hstore.html
>
>
hstore is certainly an option, as are the crosstab functions in the
tablefunc contrib module, which can definitely do what you are looking for.
 The 2-query form of the function would allow you to generate a resultset
that gives true/false/null for every possible attribute without having to
have a column for every attribute, yet you'd have all of the convenience of
a normalized schema representing attributes when doing other queries. There
are a fair number of threads about using the crosstab functions in the
archives of this list.

--sam

Re: [SQL] Insane behaviour in 8.3.3

2012-06-15 Thread Samuel Gendler

On Fri, Jun 15, 2012 at 2:28 AM, Samuel Gendler
wrote:

>
>
> On Fri, Jun 15, 2012 at 1:24 AM, Achilleas Mantzios <
> ach...@matrix.gatewaynet.com> wrote:
>
>>
>> And i forgot to mention, minicom term emulation quality sucks, even
>> giving simple
>> shell commands is a PITA, upgrading the whole fleet would mean bast case
>> scenario
>> minimum 21K USD for the whole fleet + suspension of all other activities
>> for two months.
>> If physical travel was involved, the cost would be increased at even
>> higher levels.
>>
>
> And what is the cost of data corruption on large numbers of systems?  And
> how much to fix that, especially if multiple systems fail at the same time?
>  Some things aren't free. $21K in exchange for NOT having had to keep
> systems up to date for 4 years seems like a decent trade.
>
>
Just call up an oracle sales rep and get a price quote for a single
baseline system.  Put that next to the postgresql upgrade cost for your
whole fleet.

Re: [SQL] Insane behaviour in 8.3.3

2012-06-15 Thread Samuel Gendler

On Fri, Jun 15, 2012 at 1:24 AM, Achilleas Mantzios <
ach...@matrix.gatewaynet.com> wrote:

>
> And i forgot to mention, minicom term emulation quality sucks, even giving
> simple
> shell commands is a PITA, upgrading the whole fleet would mean bast case
> scenario
> minimum 21K USD for the whole fleet + suspension of all other activities
> for two months.
> If physical travel was involved, the cost would be increased at even
> higher levels.
>

And what is the cost of data corruption on large numbers of systems?  And
how much to fix that, especially if multiple systems fail at the same time?
 Some things aren't free. $21K in exchange for NOT having had to keep
systems up to date for 4 years seems like a decent trade.

Re: [SQL] sub query and AS

2012-05-24 Thread Samuel Gendler

On Wed, May 23, 2012 at 12:07 PM, Lee Hachadoorian <
lee.hachadooria...@gmail.com> wrote:

> On Wed, May 23, 2012 at 5:24 AM, Ferruccio Zamuner 
> wrote:
> > Hi,
> >
> > I like PostgreSQL for many reasons, one of them is the possibility to use
> > sub query everywhere. Now I've found where it doesn't support them.
> >
> > I would like to use a AS (sub query) form.
> >
> > This is an example:
> >
> > First the subquery:
> >
> > select substr(descr, 7, length(descr)-8)
> >  from (select string_agg('" int,"',freephone) as descr
> >  from (select distinct freephone
> >  from calendario order by 1
> >   ) as a
> >   ) as b;
> >
> >  substr
> >
> -
> >  "800900420" int,"800900450" int,"800900480" int,"800900570"
> int,"800900590"
> > int,"800900622" int,"800900630" int,"800900644" int,"800900688"
> > int,"800900950" int
> > (1 row)
> >
> > Then the wishing one:
> >
> > itv2=#
> > select *
> >  FROM crosstab('select uscita,freephone,id from calendario order by
> > 1','select distinct freephone from calendario order by 1')
> >   --  following AS fails
> >AS (select 'uscita int, ' || substr(descr, 7, length(descr)-8)
> >  from (select string_agg('" int,"',freephone) as descr
> >  from (select distinct freephone
> >  from calendario order by 1) as a
> >   ) as b;
> >   );
> > ERROR:  syntax error at or near "select"
> > LINE 4: ...stinct freephone from calendario order by 1') as (select
> 'us...
> >
> > More is on http://paste.scsys.co.uk/198877
> >
> > I think that AS must evaluate the sub query in advance.
> >
> > It could be possible to have such behavior?
> >
> >
> > Best regards,  \ferz
>
> Ferrucio,
>
> The problem is that you are attempting to use a "subquery" to generate
> SQL that will be evaluated by the main query. This won't work the same
> way that
>
> SELECT (SELECT 'column_name') FROM some_table;
>
> wouldn't work.
>
> If you want to dynamically generate the SQL this way you will have to
> create a function or use the DO statement (Postgres 9.0+). It would
> look something like this (not tested):
>
> DO $do$
> DECLARE
>  sql text;
>  output_columns text;
> BEGIN
>
> select 'uscita int, ' || substr(descr, 7, length(descr)-8) INTO
> output_columns
>  from (select string_agg('" int,"',freephone) as descr
> from (select distinct freephone
> from calendario order by 1) as a
>  ) as b;
>
> sql := $$select *
>  FROM crosstab('select uscita,freephone,id from calendario order by
> 1','select distinct freephone from calendario order by 1')
>
>AS pivot ($$ || output_columns || $$);$$;
>
> EXECUTE sql;
>
> END$do$;
>
> If you are using Postgres <9.0 and don't have access to the DO
> statement, you'll have to stick the above into a plpgsql function.
>

If that works, that's actually a pretty cute trick for generating the
column names for that generalized version of the crosstab() function
without having to do it on the client-side in a serialized transaction or
risking a different set of columns in the function call compared to when
the client issued the same query in order to  get the column list.  I don't
imagine that it closes the race condition entirely but it would sure make
it smaller, for those who don't set the transaction isolation level
correctly.

You should stick it in the annotated version of the documentation on the
page that describes the tablefunc functions.  Or maybe it is there in
recent versions of the page.  When I last looked at those docs, there was
no mention of it that I can remember.

--sam

Re: [SQL] order by different on mac vs linux

2012-05-16 Thread Samuel Gendler

On Wed, May 16, 2012 at 3:46 PM, Wes James  wrote:

>
>
> On Mon, May 14, 2012 at 5:00 PM, Tom Lane  wrote:
>
>> Wes James  writes:
>> > Why is there a different order on the different platforms.
>>
>> This is not exactly unusual.  You should first check to see if
>> lc_collate is set differently in the two installations --- but even if
>> it's the same, there are often platform-specific interpretations of
>> the sorting rules.  (Not to mention that OS X is flat out broken when
>> it comes to sorting UTF8 data ...)
>>
>>
> I just ran these:
>
> linux:
>
> on linux
>
> # SELECT CASE WHEN 'apache' > '!yada' THEN 'TRUE' ELSE 'FALSE' END FROM
> pg_user;
>  case
> ---
>  FALSE
> (1 row)
>
> # show lc_collate;
>
>  lc_collate
> -
>  en_US.UTF-8
> (1 row)
>
> 
>
> on mac os x:
>
> # SELECT CASE WHEN 'apache' > '!yada' THEN 'TRUE' ELSE 'FALSE' END FROM
> pg_user;
>  case
> --
>  TRUE
> (1 row)
>
> # show lc_collate;
>  lc_collate
> -
>  en_US.UTF-8
> (1 row)
>
>
> ---
>
> Why is the linux postgres saying false with the lc_collage set the way it
> is?
>

That's the point - UTF-8 collation is just completely broken under OS X.
 There's much previous discussion of the topic on this list and elsewhere.
 If you're developing on OS X but running linux and you are mostly using an
ascii character set in your test dataset, set your development OS X boxes
to use C collation, which will basically do what you expect it do do until
you start throwing multibyte characters at it.  If you can't constrain your
testing/development dataset in such a manner and collation order really
matters during development, then you probably shouldn't develop on OS X.  I
spent a fair amount of time investigating how to define a new charset in
what proved to ultimately be a futile attempt to redefine UTF-8 on OSX to
behave just like ti does on Linux.  I just gave it up after wasting a few
too many hours on it. It may be possible to do it, but the return on
invested time was non-existent for me so I abandoned my effort.

Re: [SQL] Finding Max Value in a Row

2012-05-13 Thread Samuel Gendler

On Sun, May 13, 2012 at 8:11 PM, Tom Lane  wrote:

> It strikes me that "cannot be cast" is a poor choice of words here,
> since the types *can* be cast if you try.  Would it be better if the
> message said "cannot be cast implicitly to type foo"?  We could also
> consider a HINT mentioning use of USING.
>

Without the hint, I don't think that there's a ton of value in changing the
message as proposed.  It's more accurate, so may be worth doing anyway, but
probably won't be much more helpful than the current message to someone who
hasn't encountered the problem before. If they've seen it before, the old
message is likely sufficient to remind them.

Re: [SQL] generic crosstab ?

2012-04-24 Thread Samuel Gendler

On Tue, Apr 24, 2012 at 3:37 PM, Andreas  wrote:

> Am 25.04.2012 00:04, schrieb Joe Conway:
>
>  On 04/24/2012 02:42 PM, David Johnston wrote:
>>
>>> You must specify the output record structure:
>>>
>>> SELECT crosstab(text, text) AS ( col1_name col1_type [, colN_name
>>> colN_type]* )
>>>
>>> Whether this relates to the “materialization node” message you are
>>> receiving I have no idea.
>>>
>> The error is because you are selecting from a set returning function in
>> the target list rather than the from clause. It should be more like:
>>
>> SELECT * FROM crosstab(text, text) AS ( col1_name col1_type [, colN_name
>>
>>> colN_type]* )
>>>
>>
>>
> OK now i get at least some result.
> But
> 1) I need to know how many categories will apear to construct the target
> list. This is a wee bit of a problem as this number is actually dynamic.
>
> 2) There are some rows in the resulting list with empty columns within the
> row.
> When I execute the first query for a parent ID that has gaps in the
> crosstab I see it shows no gaps in the categories when called outside
> crosstab().
> E.g. it dumps  x1, x2, x3, x4, x5 when called seperately but crosstab()
> shows
> x1, x2, null, null, x5, null, x6, x7
>
> How does this make sense ?
>

I believe that the crosstab function is fairly naive about constructing the
crosstab and requires that each row in the first query be in row-order for
the crosstab output.  It is possible that it even requires the rows in the
first query to be in column order in the crosstba output.

In other words, if the first query returns results like this:

row1, col3, value1
row1, col1, value2
row2, col1, value3
row2, col3, value4
row1, col2, value5

I believe that the last row1 entry will be dropped, or else maybe you'll
get two rows in the crosstab output, each representing row1, and I am
hypothesizing (without re-reading the documentation) that the fact that
col3 comes before col1 in row1 will also result in a hole in the output.

Basically, your first query should include "order by 1,2" rather than just
"order by 1"

Have you gone through the documentation for the crosstab functions?  I
believe that will answer most of your questions.  It has been a long time
since I've read them, but I remember them being fairly complete.

As for your question about knowing how many columns you will have, I've
always solved that by opening a serialized transaction and running a query
to get the possible set of columns, then issue the sql statement that
includes the crosstab(sql,sql) call, using the first query results to build
up the return result type.  Depending on the nature of your data set, it is
possible you can forgo the serialized transaction, since there may be
little to no risk of new column names appearing between the two queries.
 You could also just structure the queries in the second statement so that
it is guaranteed to return only the results returned by your initial query
- include a where clause which is guaranteed to return only rows that match
the columns returned by your first query.  You may wind up missing a value
that was added between the first query and the second query, but odds are
good that if your data is that volatile, it doesn't matter if you miss a
value which happened to be inserted in those few milliseconds.

--sam

Re: [SQL] generic crosstab ?

2012-04-24 Thread Samuel Gendler

On Tue, Apr 24, 2012 at 1:01 PM, Andreas  wrote:

> Hi,
>
> is there a generic solution to dump the result of a query as a crosstab,
> when I can't know how many columns I will need?
>
> E.g. I get something like this:
>
> id, x
> 1,  a
> 1,  b
> 1,  c
> 2,  l
> 2,  m
>
>
>
Yes.  You can provide a query which returns the columns to the version of
the crosstab function which looks like this:

crosstab(text source_sql, text category_sql)
It does exactly what you are looking for. The second query returns the set
of values that act as columns in the final result (the pivot for each row
in the result returned by the first query).  This allows the function to
correctly insert a null for any column for which there is no row in the
first query results.

Re: [SQL] How To Create Temporary Table inside a function

2012-03-02 Thread Samuel Gendler

On Fri, Mar 2, 2012 at 3:49 AM, Philip Couling  wrote:

> Hi Rehan
>
> I suggest attempting to drop the table before you create the temp table:
> DROP TABLE IF EXISTS table1;
>
> See:
> http://www.postgresql.org/docs/current/static/sql-droptable.html
>
>
> Also if you're using an actual TEMP table, PostgreSQL can automatically
> drop the table or just empty it once the transaction is committed:
> CREATE TEMP TABLE foo (columns...) ON COMMIT DROP;
> CREATE TEMP TABLE foo (columns...) ON COMMIT DELETE ROWS;
>
> See:
> http://www.postgresql.org/docs/current/static/sql-createtable.html
>
>
> As for filling the table, you have two options, INSERT ... SELECT ...
> Which is just the same syntax as you've used previously for MS SQL or
> create the table with data already in it it:
> CREATE TEMP TABLE foo ON COMMIT DROP AS SELECT ...
>
> See:
> http://www.postgresql.org/docs/current/static/sql-createtableas.html
>
> Hope this helps.
>

Rehan,

100% of the information in this email is available in the postgresql
documentation in locations where it would be trivially easy to find if any
attempt at all was made to look in the documentation prior to coming to the
mailing list for assistance.

It is fruitless to attempt to work as a database administrator or developer
if you do not first familiarize yourself with its documentation.  The
mailing lists are here to assist users with difficult-to-solve problems,
not as substitute for reading the manual. Answering easily solved questions
takes limited resources away from users who may have more intractable
issues that they need assistance with.

I can't help but notice that this isn't even the first question you've
posed to which responses have included suggestions that you read the
documentation.  Please consider taking our advice.

--sam

Re: [SQL] must appear in the GROUP BY clause or be used in an aggregate function problem

2012-01-31 Thread Samuel Gendler

On Tue, Jan 31, 2012 at 1:43 PM, Edward W. Rouse wrote:

> NM, I figured it out. The mere presence of an aggregate function and/or
> the group by clause is what’s causing all the hate for me. I will take a
> whack at getting this to work without them. Thanks all.
>

All columns that are not in an aggregate function MUST be in the group by
clause if there is any column in an aggregate function.  If you can get rid
of all aggregation, then you won't have to have the group by, either.  You
could use a correlated subquery to get the most recent report date, rather
than using max.  That would allow you to ditch all of the other
aggregation, I suspect.  But unless there is any chance of different
versions of the name fields for a given id, then it is harmless to drop the
MAX() function call and add them to the group by clause.  You get the same
effect.




> 
>
> ** **
>
> *From:* pgsql-sql-ow...@postgresql.org [mailto:
> pgsql-sql-ow...@postgresql.org] *On Behalf Of *Edward W. Rouse
> *Sent:* Tuesday, January 31, 2012 4:27 PM
> *To:* 'Samuel Gendler'
>
> *Cc:* pgsql-sql@postgresql.org
> *Subject:* Re: [SQL] must appear in the GROUP BY clause or be used in an
> aggregate function problem
>
> ** **
>
> I would love to remove most of the aggregate functions. I am trying to
> update an existing query to provide better data and started with the
> existing one (which is where the MAX for the names came from). I originally
> tried to remove those, but I get the group by/aggregate function error if I
> do. I guess I don’t understand enough to know why these are being forced
> into that category. I have written queries in the past that had several
> columns that were not considered requiring aggregation. Can someone
> enlighten me on why these are being marked as requiring group/aggregation.
> 
>
> ** **
>
> Something with the way the table are joined perhaps? Can I remove the
> joins and put the a.id = c.id in the where clause and get rid of this?****
>
> ** **
>
> *From:* pgsql-sql-ow...@postgresql.org [mailto:
> pgsql-sql-ow...@postgresql.org] *On Behalf Of *Samuel Gendler
> *Sent:* Tuesday, January 31, 2012 3:57 PM
> *To:* Edward W. Rouse
> *Cc:* pgsql-sql@postgresql.org
> *Subject:* Re: [SQL] must appear in the GROUP BY clause or be used in an
> aggregate function problem
>
> ** **
>
> On Tue, Jan 31, 2012 at 12:27 PM, Edward W. Rouse 
> wrote:
>
> And in most cases this works fine. The problem arises when invoices get
> added to the exception table due to their not being an invoice number. Even
> though we join on the tracking id, the group by on invoicenum lumps the
> different blank invoices into a single line, if the same user has more than
> 1. What we want is for each of the individual blank invoicenum entries to
> have a separate line in the result. If I remove b.invoicenum from the group
> by then I get the error in the subject line. If I try to use an aggregate
> function (like I used MAX on the names) it's even worse. MAX works on the
> names because they are all the same. MAX on the date doesn't seem to effect
> the results that I can see other than if an invoice went into exception
> more
> than once, and in that case we only want the latest one anyway.
>
> Any hints as to how to get this to not lump all of the empty invoicenums
> for
> a user into a single line? And I have mentioned putting in a dummy value
> like the date for an invoicenum, but that works as well as I expected it
> would (not at all).
>
> ** **
>
> It seems like the use of the aggregation functions in your example aren't
> absolutely necessary - though perhaps the date one is.  Best solution would
> be to get rid of the aggregate columns so that you don't have this issue.*
> ***
>
> ** **
>
> If that isn't possible, there are definitely some kludges you can use to
> get it to work if you really don't want to union 2 queries together - one
> with and one without valid invoice ids. Assuming invoice ids are generated
> via a sequence, you can do the following to guarantee a unique 'id' for
> each empty invoice - at the expense of incrementing your sequence
> unnecessarily:
>
> ** **
>
> COALESCE(invoiceid, nextval(invoiceid_sequence))
>
> ** **
>
> A better version of that kludge would be to create a sequence just for
> this purpose and set it to a very negative number.  All of your generated
> fake ids will then be negative numbers (so easily identified by whatever is
> consuming the query results) and you can reset the sequence back to the
> most negative possible value whenever you get concerned about running out
> of ids, since you won't be using the same seq

Re: [SQL] must appear in the GROUP BY clause or be used in an aggregate function problem

2012-01-31 Thread Samuel Gendler

On Tue, Jan 31, 2012 at 12:27 PM, Edward W. Rouse wrote:

> And in most cases this works fine. The problem arises when invoices get
> added to the exception table due to their not being an invoice number. Even
> though we join on the tracking id, the group by on invoicenum lumps the
> different blank invoices into a single line, if the same user has more than
> 1. What we want is for each of the individual blank invoicenum entries to
> have a separate line in the result. If I remove b.invoicenum from the group
> by then I get the error in the subject line. If I try to use an aggregate
> function (like I used MAX on the names) it's even worse. MAX works on the
> names because they are all the same. MAX on the date doesn't seem to effect
> the results that I can see other than if an invoice went into exception
> more
> than once, and in that case we only want the latest one anyway.
>
> Any hints as to how to get this to not lump all of the empty invoicenums
> for
> a user into a single line? And I have mentioned putting in a dummy value
> like the date for an invoicenum, but that works as well as I expected it
> would (not at all).
>

It seems like the use of the aggregation functions in your example aren't
absolutely necessary - though perhaps the date one is.  Best solution would
be to get rid of the aggregate columns so that you don't have this issue.

If that isn't possible, there are definitely some kludges you can use to
get it to work if you really don't want to union 2 queries together - one
with and one without valid invoice ids. Assuming invoice ids are generated
via a sequence, you can do the following to guarantee a unique 'id' for
each empty invoice - at the expense of incrementing your sequence
unnecessarily:

COALESCE(invoiceid, nextval(invoiceid_sequence))

A better version of that kludge would be to create a sequence just for this
purpose and set it to a very negative number.  All of your generated fake
ids will then be negative numbers (so easily identified by whatever is
consuming the query results) and you can reset the sequence back to the
most negative possible value whenever you get concerned about running out
of ids, since you won't be using the same sequence as the invoice table
itself.

There are probably lots of variations on that general concept.  You can
write a window function which will remember the ids already seen/generated
for each row and just generate an arbitrary id to be used in place of null
for grouping, though you'd have to worry about using an id that has not yet
appeared but which later does appear.  Assuming you can create a bucket of
space large enough that is guaranteed to not conflict with valid invoice
ids, you could make it work.  I don't know if you can have select max(x),
invoiceid group by func(invoiceid), so you may have to structure it as
"select max(x), func(invoiceid) group by 2" which would require that your
query results can deal with faked invoiceids - negative numbers again
providing a possible way to identify them.

Doesn't postgres allow operator overloading?  Perhaps you could override
the equality operator for that type so that just a single value (the max
value, for example) always compares as unequal to itself and then use
COALESCE(invoiceid, 9223372036854775807).  All rows without a valid value
will have that value, but the overloaded equality operator will cause them
to not group together (I'm guessing.  I haven't tried this). That one makes
me nervous because of the potential for nasty side effects should something
legitimately have that value, but one could probably make the case for
having a bigger problem if a column has a value equal to max bigint.

Restructuring the query to separate valid invoice ids from invalid and/or
getting rid of the aggregation does seem like the best solution, though.

Re: [SQL] date range to set of dates expansion

2012-01-19 Thread Samuel Gendler

On Thu, Jan 19, 2012 at 8:20 AM, Adrian Klaver wrote:

> On Thursday, January 19, 2012 7:59:27 am Gary Stainburn wrote:
> > The following code works in 8.4 but not 8.3.
> > Anyone know why, or what I need to do to change it?
> >
> > SELECT aid, asid,
> >date_range (asdate, afdate)::date AS asdate,
> >acomments
> > FROM availability
> >
> > In 8.4 it returns the expanded dataset as required. In 8.3 I get:
> >
> > ERROR:  set-valued function called in context that cannot accept a set
> > CONTEXT:  PL/pgSQL function "date_range" line 4 at RETURN NEXT
>
> As to why it works in 8.4 vs 8.3
>
> http://www.postgresql.org/docs/8.4/interactive/release-8-4.html
>
> "Support set-returning functions in SELECT result lists even for functions
> that
> return their result via a tuplestore (Tom)
>
> In particular, this means that functions written in PL/pgSQL and other PL
> languages can now be called this way.'
>
> In 8.3- I believe you could only call it as
>
> SELECT * from date_range (asdate, afdate)::date AS asdate;
>

I don't think you can have that cast there when it is in the from-clause.
 You can refer to its values in the select clause explicitly if you alias
the the results:

select r.range_date::date from date_range(asfdate, afdate) as
r(range_date); Presumably, you can fashion something that satisfies your
needs by putting something like this in a subquery which refers to the
start and end date in each row of the outer query.  I'm in a rush, so no
time to figure out a working demo for you.

--sam

Re: [SQL] how to return whole table from Function not just the id integer column

2012-01-19 Thread Samuel Gendler

On Thu, Jan 19, 2012 at 1:57 AM, Rehan Saleem  wrote:

> hi ,
>
> how can i return the whole user table from this function not just the id . 
> thanks
>
> Chapter 39, specifically 39.3, of the postgresql documentation provides
all of the information necessary to answer this question.  If, after
reading that chapter and attempting to solve the problem yourself, you
still have no success, then please post your question to the list along
with your best attempt at a solution.  You cannot expect to make efficient
use of a sophisticated rdbms without first reading at least the most
relevant sections of the documentation, and the mailing lists do not have
enough experienced users to devote bandwidth to doing your work for you.

http://www.postgresql.org/docs/current/static/plpgsql-declarations.html

Re: [SQL] UPDATE COMPATIBILITY

2012-01-17 Thread Samuel Gendler

On Tue, Jan 17, 2012 at 12:49 AM, Thomas Kellerer wrote:

> Gera Mel Handumon, 17.01.2012 07:31:
>
>  What version of postgresql that the update compatibility below will be
>> implemented?
>>
>> UPDATE COMPATIBILITY
>>
>>
>> UPDATE accounts SET (contact_last_name, contact_first_name) =
>> (SELECT last_name, first_name FROM salesmen
>>  WHERE salesmen.id = accounts.sales_id);
>>
>
> None as far as I know.
>
> You need to rewrite it to:
>
> UPDATE accounts
>  SET contact_last_name = s.last_name,
>  contact_first_name = s.first_name
> FROM salesmen s
> WHERE s.id = accounts.sales_id
>
>
You can put a query in the from clause instead of a table, I believe, too:

UPDATE accounts
SET contact_last_name = q.last_name,
contact_first_name = q.first_name
FROM (select last_name, first_name from salesmen where s.id =
accounts.sales_id)

Not any more functional than the previous example, in this case, but very
useful if you need a complicated join or aggregation/grouping.

Some useful info from the postgresql documentation on the UPDATE statement (
http://www.postgresql.org/docs/9.0/static/sql-update.html ):

"When a FROM clause is present, what essentially happens is that the target
table is joined to the tables mentioned in the fromlist, and each output
row of the join represents an update operation for the target table. When
using FROM you should ensure that the join produces at most one output row
for each row to be modified. In other words, a target row shouldn't join to
more than one row from the other table(s). If it does, then only one of the
join rows will be used to update the target row, but which one will be used
is not readily predictable.

Because of this indeterminacy, referencing other tables only within
sub-selects is safer, though often harder to read and slower than using a
join."

Fwd: [SQL] i want small information regarding postgres

2012-01-05 Thread Samuel Gendler

would someone with the appropriate authority please unsubscribe this
person's email address from this list so we don't all get a bounce message
after every email we send to the list?  Thanks.

--sam

-- Forwarded message --
From: Jan Verheyden 
Date: 2012/1/2
Subject: Re: [SQL] i want small information regarding postgres

Please find an English version below.

Geachte,

Dit is een geautomatiseerd antwoord.

Het email adres van Jan Verheyden werkt niet meer.

De gebruiker heeft geen email adres opgegeven voor professionele doeleinden.

De gebruiker heeft geen email adres opgegeven voor persoonlijke doeleinden.

Gelieve niet te antwoorden op deze geautomatiseerde email.

Hoogachtend,
Dienst Informatiesystemen
Systeembeheer en Ondersteuning
UZ Leuven

Dear,

This is an automated reply.

The email address of Jan Verheyden is no longer active.

The user has not indicated an email address for professional purposes.

The user has not indicated an email address for personal purposes.

Please do not reply to this automated email.

Kind regards,
Information Technology
System Administration and Support
UZ Leuven

Re: [SQL] ignore unique violation OR check row exists

2012-01-04 Thread Samuel Gendler

On Wed, Jan 4, 2012 at 1:57 AM, Zdravko Balorda wrote:

> Andreas Kretschmer wrote:
>
>> rverghese  wrote:
>>
>>  I want to insert a bunch of records and not do anything if the record
>>> already
>>> exists. So the 2 options I considered are 1) check if row exists or
>>> insert
>>> and 2) ignore the unique violation on insert if row exists. Any opinions
>>> on whether it is faster to INSERT and then catch the UNIQUE
>>> VIOLATION exception and ignore it in plpgsql  versus check if row exists
>>> and
>>> INSERT if it doesn't. I can't seem to ignore the UNIQUE VIOLATION
>>> exception via php, since it is a
>>> plpgsql command, so if I have to do the check and insert, alternatively i
>>> have a function that tries to insert and then ignores the violation. I
>>> was
>>> wondering if one way was better than the other.
>>> Thanks
>>>
>>
>
> Take it out of transaction. Why is there a transaction in the first place?
> If transaction is needed, ok, but take these inserts out and everything
> will
> work as it should. Ignoring UNIQUE VIOLATION or any other error defeats
> the very
> purpose of transaction. That's why you can't ignore it.
>

Unfortunately, bulk inserts are much slower when they don't occur in a
transaction.  Try inserting 1 million rows with auto commit enabled vs 1
million rows in 1 transaction, or even 10 or 100 transactions. The
difference is enormous.  The bulk insert into an unconstrained table and
then pulling just the new rows over into the destination table in a single
transaction is definitely the most effective way to do this.

Re: [SQL] i want small information regarding postgres

2012-01-02 Thread Samuel Gendler

On Sun, Jan 1, 2012 at 8:21 PM, vinodh chowdary wrote:

> Hi sir,
>
> i am using postgres as database for  my project.
> i want to convert postgres data into excel sheet.
> is there any way to do it?
> please reply me.
>
>
You can export data in CSV format, which excel can easily read, or you can
set up ODBC access and load data directly into excel via an odbc query.

Google "postgresql csv" for the first solution and "postgresql excel odbc"
for the second.  In both cases, the very first link has exactly the
information you are looking for.

Re: [SQL] Current transaction is aborted, commands ignored until end of transaction block

2011-12-30 Thread Samuel Gendler

On Fri, Dec 30, 2011 at 4:03 PM, Scott Marlowe wrote:

> On Fri, Dec 30, 2011 at 4:19 PM, Tom Lane  wrote:
> > Jan Bakuwel  writes:
> >> Why-o-why have the PostgreSQL developers decided to do it this way...?
> >
> > Because starting and cleaning up a subtransaction is an expensive thing.
> > If we had auto-rollback at the statement level, you would be paying that
> > overhead for every statement in every transaction, whether you need it
> > or not (since obviously there's no way to forecast in advance whether a
> > statement will fail).  Making it depend on explicit savepoints allows
> > the user/application to control whether that overhead is expended or
> > not.
> >
> > If you want to pay that price all the time, there are client-side
> > frameworks that will do it for you, or you can roll your own easily
> > enough.  So we do not see it as a big deal that the database server
> > itself doesn't act that way.
>
> Having used PostgreSQL a LOT, I find that being able to throw an
> entire update at the db and having it fail / be rolled back / CTRL-C
> out of and fix the problem is actually much less work than the
> frameworks for other databases.  Once you've chased down bad data in a
> load file a few times, it's really pretty easy to spot and fix these
> issues and just run the whole transaction again.  Since PostgreSQL
> doesn't have a very big penalty for rolling back a whole transaction
> it's not that bad.  Some dbs, like MySQL with innodb table handler
> have a 10:1 or greater penalty for rollbacks.  Insert a million rows
> in innodb then issue a rollback and go get a sandwich.  In PostgreSQL
> a rollback is generally instantaneous, with the only real cost being
> bloat in the tables or indexes.
>

More to the point - if a statement is truly independent of all the other
statements in a transaction, it would seem that the transaction itself is
poorly defined.  The whole point of a transaction is to define an atomic
unit of work. If you don't care about atomicity, enable auto commit and
just catch the constraint violation exception and continue on your merry
way.  Yes, on occasion, working around the way postgresql functions causes
extra work for a developer (I don't think anyone is suggesting that it
should change the end user experience, as was sort-of implied by one
response on this thread), but so too can code which is not atomic cause
extra work for a developer - and transactions are intended to be atomic, so
it makes far more sense to me to implement it the postgres way and incur
the modicum of extra developer overhead in the few cases where I may want
to deal with acceptable constraint violations rather than in the many cases
where I want a transaction to be atomic.

In the example of users adding a new value to an enumerated list in the
same unit of work as other rows are inserted in, it is likely not too much
work to use a trigger to check the insert prior to executing it - assuming
that list is in another table with just a foreign key going into the table
the majority of your inserts are going to.  Alternatively, if you aren't
doing a bulk insert via a copy, it probably isn't too much work to
construct the set of inserts needed for the joined table separately and
issue those in separate transactions before doing the main transaction.

Re: [SQL] Problem with tables and columns names

2011-12-19 Thread Samuel Gendler

On Mon, Dec 19, 2011 at 12:16 AM, Gabriel Filipiak <
gabriel.filip...@gmail.com> wrote:

> Hi,
>
> so I am working on PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled
> by gcc (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2, 64-bit.
>
> It seems that i can't give a table name for example testTable it has to be
> test_table, because I can't access it via psql is that right or am I doing
> something wrong? The same thing for columns. Can anyone give me a hint
> about that?
>
> What is really confusing is that I can access those tables and columns via
> SQLAlchemy.
>

Are you creating your table within a schema that is not in your
search_path?  Perhaps SQLAlchemy is providing the fully qualified name but
your query in psql is not?  The other option is case-sensitivity

table and column names should not be case sensitive unless you quote the
names in the create statement, in which case, you must always quote them
and capitalize them in the same way.

create table myschema.testtable (column1 int);  -- no quotes means
case-insensitive table and column name

should work with a query like this:

select * from myschema.TeStTaBlE

but

create table "mySchema"."TestTable" ("Column1" int) -- quotes forces case
sensitivity

will only work with queries like this:

select "Column1" from "mySchema"."TestTable"

but won't work with a query like this:

select Column1 from mySchema.TestTable

without the quotes, postgres won't recognize table and column names that
were quoted at creation, even though they are capitalized in the same way.

Re: [SQL] updating a sequence

2011-11-15 Thread Samuel Gendler

On Tue, Nov 15, 2011 at 4:28 PM, John Fabiani  wrote:

> On Tuesday, November 15, 2011 08:33:54 am Richard Broersma wrote:
> > On Tue, Nov 15, 2011 at 8:06 AM, John Fabiani 
> wrote:
> > > alter sequence somename restart with (select max(pk) from sometable).
> > >
> > > I need this for automating an ETL (using pentaho).
> >
> >
> http://www.postgresql.org/docs/9.1/interactive/functions-sequence.html#FUNC
> > TIONS-SEQUENCE-TABLE
>
>
> I don't see how that helps answer my problem.  I know how to update a
> sequence.  I want to pass a value for the restart vaue that depends on a
> query
> - all in one statement.  I would think it is a common problem i.e.
> migrating
> data.
>
>
use a subquery to set the value -

select setval('foo', select max(some_id) from some_table)

It's all right there in the docs that you were pointed to. We try to
encourage people to be somewhat self sufficient around here.

Re: [SQL] Problem with pivot tables

2011-11-06 Thread Samuel Gendler

On Sun, Nov 6, 2011 at 9:06 AM, Gabriel Filipiak  wrote:

> I have problem with creating a pivot table in postgreSQL using crosstab
> function. It works well but it produces many records for the same
> client_id, how can I avoid it?
>
> Here is the SQL:
>
> SELECT * FROM crosstab('SELECT client_id,extract(year from date),
> sum(amount)  from orders group by extract(
> year from date), client_id','SELECT extract(year from date) FROM orders
> GROUP BY extract(year from date) order by extract(year from date)')
> AS orders(
> row_name integer,
> year_2001 text,
> year_2002 text,
> year_2003 text,
> year_2004 text,
> year_2005 text,
> year_2006 text,
> year_2007 text,
> year_2008 text,
> year_2009 text,
> year_2010 text,
> year_2011 text);
>
>
I think it assumes all client_id rows will occur together, so as soon as it
sees a different client_id, it moves to the next row.  If it then
encounters the original client_id, it creates yet another row for it.  Add
an order clause to your first query so that it will get all years for each
client_id sequentially.

Re: [SQL] new user on mac

2011-10-20 Thread Samuel Gendler

On Tue, Oct 18, 2011 at 2:47 PM, Scott Swank  wrote:

> I have a postgres 9.1 database up & running, no problem. Purely in
> terms of writing sql (ddl, dml & pg/plsql), what tools are
> recommended?
>
> Coming from an Oracle world, I'm thinking of toad, sql developer, etc.
>
> 1. psql & text editor of choice (if so, which one?)
>

Count me as one who has never felt the need for anything other than terminal
window, vim/emacs, a browser with the postgresql docs, and the psql client.
 I used pg_dump to get create statements if I need them, but mostly just the
\d command in psql to show me table structure without create statements.  It
is definitely worth exploring the various meta-commands in psql in order to
learn what is available.  Most of the psql documentation is devoted to them
- http://www.postgresql.org/docs/9.0/static/app-psql.html  Even if you use
pgAdmin, you'll be much better off if you know how to do more than just the
basics in psql.

Re: [SQL] ambiguous local variable name in 9.0 proc

2011-09-21 Thread Samuel Gendler

On Wed, Sep 21, 2011 at 4:49 PM, David Johnston  wrote:

> ** **
>
> *
> *
>
> Not tested but I think all local variables are implicitly scoped to the
> function name so you should be able to do the following:
>
> ** **
>
> WHERE reporting_mgmt.aggregate_timescales_impl.tbl_schema = e.tbl_schema**
> **
>
> **
>

Yep.  It is also possible to change the db behaviour globally or on a
per-function basis

from
http://developer.postgresql.org/pgdocs/postgres/plpgsql-implementation.html

"To change this behavior on a system-wide basis, set the configuration
parameter plpgsql.variable_conflict to one of error, use_variable, or
use_column(where error is the factory default)."

You can also set the behavior on a function-by-function basis, by inserting
one of these special commands at the start of the function text:

#variable_conflict error
#variable_conflict use_variable

#variable_conflict use_column

Sorry for jumping straight a mailing list query.  It was actually relatively
easy to find in the docs.  Now to figure out if the function name scoping
trick works in 8.4 so that I can modify my procs prior to upgrading my db.

--sam

[SQL] ambiguous local variable name in 9.0 proc

2011-09-21 Thread Samuel Gendler

I've got a stored proc (which worked fine in 8.3 and 8.4) that is declared
as such:

CREATE OR REPLACE FUNCTION reporting_mgmt.aggregate_timescales_impl (
   div_start TIMESTAMP WITHOUT TIME ZONE,
*   tbl_schema VARCHAR, *
   tbl_root VARCHAR,
   fine_timescale VARCHAR,
   coarse_timescale VARCHAR,
   coarser_timescale VARCHAR,
   fact_fields VARCHAR,
   dim_fields VARCHAR,
   sum_fields VARCHAR)
RETURNS INTEGER
AS $$

Within that proc, I've got the following line:

IF EXISTS ( SELECT table_name FROM information_schema.tables
  WHERE table_schema = tbl_schema
AND table_name = tbl_fine_part_old ) THEN
IF EXISTS ( SELECT status FROM reporting_mgmt.etl_status AS e
* WHERE tbl_schema = e.tbl_schema*
   AND tbl_root = e.tbl_root
   AND div_start_old = e.fine_time
   AND coarse_timescale = e.coarse_scale
   AND status = 0 ) THEN

And in 9.0, it is generating the following error:

ERROR:  column reference "tbl_schema" is ambiguous
LINE 2:  WHERE tbl_schema = e.tbl_schema
  ^
DETAIL:  It could refer to either a PL/pgSQL variable or a table column.
QUERY:  SELECT EXISTS ( SELECT status FROM reporting_mgmt.etl_status AS e
WHERE tbl_schema = e.tbl_schema
  AND tbl_root = e.tbl_root
  AND div_start_old = e.fine_time
  AND coarse_timescale = e.coarse_scale
  AND status = 0 )
CONTEXT:  PL/pgSQL function "aggregate_timescales_impl" line 52 at IF
PL/pgSQL function "aggregate_timescales" line 23 at RETURN

Of course, it is true that tbl_schema could refer to the column in table
etl_status, except older versions of postgres seemed to correctly figure out
that comparing a column to itself isn't terribly useful, so it must be
referring to the pl/pgsql variable rather than the table column.

I'm happy to modify the proc definition, except that I am unsure how to do
so other than to rename the variable, which is my least favourite way to do
that.  I'd far rather qualify the name somehow, so that it knows that I am
refering to a local variable, if at all possible.  Suggestions?

Re: [SQL] Use select and update together

2011-09-13 Thread Samuel Gendler

On Mon, Sep 12, 2011 at 3:36 AM, Boris  wrote:

>
> SELECT (
> UPDATE tbl SET val = 1
> WHERE KEY = any('{0,1,2,3,4,5}'::int[])
> returning key
> );
>
> cause syntax error. Is any query of such type (update warped into
> select) is possible?
>
>
Just lose the select (); part.

update tbl set val = 1 where condition = true returning *;

That works just fine on 8.4.  You can also specify individual columns in the
returning clause.  The postgresql documentation covers the topic.

http://www.postgresql.org/docs/9.0/static/sql-update.html

It doesn't appear to work in a subquery, so you can't join to it (though you
can use a join within the update clause and then return the columns you need
from a joined table, I suspect).  Simply returning 1 or more columns from
updated rows works just fine, though.

Re: [SQL] Want some basic compare of data type on PostgreSQL and MySQL

2011-09-01 Thread Samuel Gendler

On Wed, Aug 31, 2011 at 10:15 PM, bhavesh1385 wrote:

> Hello All,
>
> I Want some basic compare of data type on PostgreSQL and MySQL.
>
> I am Try to using create the database on PostgreSQL with the same query on
> MySQL then it will create a problem...
>
> I have make changes according to the Datatype of PostgreSQL and below are
> the syntax for the query.
>
> Into MySQL the Syntax Look like :-
> -
> CREATE TABLE IF NOT EXISTS `tbl_admin` (
>  `iadmin_id` int(11) NOT NULL AUTO_INCREMENT,
>  `vemail` varchar(255) NOT NULL,
>  `vname` varchar(255) NOT NULL,
>  `irole_id` int(11) NOT NULL,
>  `vpassword` varchar(255) NOT NULL,
>  `estatus` enum('Active','InActive') NOT NULL,
>  PRIMARY KEY (`iadmin_id`)
> ) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
>
>
>
> So i want to execute the same query into PostgreSQL and i try the same
> query
> but error occur..
>
> and i correct the syntax of the query according to PostgreSQL as following
> --
> CREATE TABLE tbl_admin1 (
>  iadmin_id integer NOT NULL,
>  vemail character varying,
>  vname character varying,
>  irole_id integer,
>  vpassword character varying,
>  PRIMARY KEY (iadmin_id)
> )
>
> ---
> NOTE :- Still I have some below confusion into following
>
> [1] How to make Primary Key as a Auto Increment...?
> [2] Suppose I want to put 'ENUM' data type then how i can do that ..?
> [3] Please suggest me basic Data type Comparesion between PostgreSQL and
> MySQL .. like :-
>
>  PostgreSQL  MySQL
> -
>  characte varying VARCHAR
> integer  INT
>
> like that i want all the possible data type comparision.
>

First, the documentation for postgresql is very comprehensive and available
here: http://www.postgresql.org/docs/9.0/interactive/index.html

More specifically, the documentation for data types is here:
http://www.postgresql.org/docs/9.0/interactive/datatype.html

To create an auto incrementing column, use type 'serial' for 4 byte signed
values and 'bigserial' for 8 byte values.  Postgresql will create a sequence
and use that sequence to generate a value on insert if no value is provided.
 There is documentation for the serial shorthand here:
http://www.postgresql.org/docs/9.0/interactive/datatype-numeric.html#DATATYPE-SERIAL

varchar and int are part of the sql standard and can be used as-is in
postgresql.  You can't specify the length of the string representation of an
integer via int(11) but you can specify constraints for max and min values,
if required.  Use bigint for 8 byte integers.  Also, be sure to read the
description of the varchar datatype so that you understand the differences
in performance compared to mysql, especially in regard to using varchar
instead of text types.

There is documentation for enum types here:
http://www.postgresql.org/docs/9.0/interactive/datatype-enum.html

Given the available values for your estatus enum, you may want to just use
the built in boolean datatype and rename the column to 'active'

CREATE TYPE status_enum AS ENUM ('Active', 'InActive');

CREATE TABLE tbl_admin (
 iadmin_id serial NOT NULL PRIMARY KEY,
 vemail varchar(255) NOT NULL,
 vname varchar(255) NOT NULL,
 irole_id bigint NOT NULL CHECK (irole_id >= -999 AND irole_id <=
999),
 vpassword varchar(255) NOT NULL,
 estatus status_enum NOT NULL
);

However, I imagine that irole_id is actually a foreign key to another table,
in which case, you don't really want the check constraint in this table, but
you do want an explicit foreign key. Note that I changed the type to bigint
in order to accomodate your original int(11) datatype, since 999 is
too large for a 4 byte value. To declare the irole_id column as a foreign
key to the tbl_role table, using an integer type, do the following:

CREATE TABLE tbl_admin (
 iadmin_id serial NOT NULL PRIMARY KEY,
 vemail varchar(255) NOT NULL,
 vname varchar(255) NOT NULL,
 irole_id integer *references tbl_role(irole_id)*,
 vpassword varchar(255) NOT NULL,
 estatus status_enum NOT NULL
);

Documentation of foreign keys is at:
http://www.postgresql.org/docs/8.3/static/ddl-constraints.html#DDL-CONSTRAINTS-FK

In future, please make an effort to discover the answers to such basic
questions in the documentation for yourself.  No one wants to spend a lot of
time answering questions that have already been well documented elsewhere.

Re: [SQL] Need a little help with geometric query

2011-08-25 Thread Samuel Gendler

On Thu, Aug 25, 2011 at 3:56 PM,  wrote:

> This is my first shot at using postgresql's geometric functions
> I have a passel of lines defined by x1 y1 x2 y2
> I created a table with type lseg and loaded it
> I start by finding the line that has an end closest to 0,0
> Now I want to find the next line that has an end closest to the first
> line's other end and so on
> I want to find the shortest path
> I see there are a lot of geometric operators but I am a bit baffled
> Can someone offer suggestions or reading references
>

Forget about the built-in geometry types and operators in postgres.  You
really want to take a look at postGIS for doing geometric calculations.  It
has a much more diverse set of operators and functions and adheres to
various GIS standards which will allow your db to interact with other GIS
tools as well. It is quite easy to learn and there is an excellent book
which, I believe, also has an electronic edition available so you can start
reading immediately.  Finding a geometry that is 'closest' to a particular
point or to another 2D geometry (or even 3D in the upcoming version) is
relatively trivial.  I'm sure there is a function, or at least a common
technique, for finding a shortest path between 2 points, since GIS is often
used for mapping applications. It's not my specialty, so I'll leave the
details to your research.

Re: [SQL] need magic to shuffle some numbers

2011-08-23 Thread Samuel Gendler

I don't have time to experiment with actual queries, but you can use the
rank() window function to rank rows with prio sorted ascending, and do the
same thing to rank rows with prio sorted descending, and update rows with
the value from the second where the rank matches the rank from the first.
 I'm guessing that you can't use a window function in an update directly, so
you'll likely need to structure things as several subselects, but the basic
principle is sound, I think.  If you do have more than one row with the same
value, you can use row_number() instead of rank() in order to get unique
'rank' for rows that have the same value.

Window functions are here:
http://www.postgresql.org/docs/9.0/static/tutorial-window.html.

Build a query that returns primary_key1, rownum1, prio1 with prio sorted
ascending.  Do the same for rownum2, prio2 with prio sorted descending.
 Then join those two queries in an outer query on rownum1 = rownum2.  Use
that query as part of an update statement to set prio = prio2 where
primary_key = primary_key1.  You can likely do it more efficiently by
combining things into a single query, but you are only going to run this
once, and that was easier to describe, textually.

On Tue, Aug 23, 2011 at 12:49 PM, Andreas  wrote:

> Hi,
>
> there is a table that has among others a integer primary key "id" and
> another integer column "prio" as well as an integer "group_id".
>
> I'd like to invert the values of the prio-column for one of the groups.
> The prio numbers start with 3 and there are 1159 different prios in this
> group.
> At least every value appeares only once.   :)
>
> Is there an elegant way to switch the prio values around so that every
> record with the first prio gehts the last and vice versa?
> Then the records with the second smallest prio get the second-to-last
> biggest value and v.v.
> ...
>
>
> regards
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/**mailpref/pgsql-sql
>

Re: [SQL] exclusion constraint for ranges of IP

2011-08-23 Thread Samuel Gendler

On Tue, Aug 23, 2011 at 1:27 AM, Herouth Maoz  wrote:

> My thanks to everyone who replied.
>
> I have decided not to implement that constraint at this time. Using a
> compound type will make the system more complicated and less readable, plus
> requires installing the package which is beyond vanilla PostgreSQL.
>
> Now I have another exclusion constraint I'm thinking about in another and I
> want to verify that this will do what I mean it to do:
>
> CREATE TABLE invoice_definitions
> (
> id  SERIAL  PRIMARY KEY NOT NULL,
> customer_id INTEGER NOT NULL REFERENCES customers(id),
> is_default  BOOLEAN NOT NULL DEFAULT FALSE,
> bill_descriptionVARCHAR(100)NOT NULL,
> itemized_description VARCHAR(100)   NOT NULL,
> EXCLUDE USING GIST  ( customer_id WITH =, is_default WITH AND )
> )
> ;
>
> Basically, each customer can have several rows in this table, but only one
> per customer is allowed to have is_default = true. Is this exclude
> constraint correct?
>

You can validate this yourself with 3 insert statements into the table
declared in your email.

Re: [SQL] Stuck Up In My Own Category Tree

2011-08-11 Thread Samuel Gendler

On Thu, Aug 11, 2011 at 8:39 AM, Don Parris  wrote:

> Hi all,
>
> Note: I'm happy to read howtos, tutorials, archived messages - I just
> haven't found anything that addresses this yet.  I found a related topic on
> the novice list, but my post got no response there.  I've been struggling
> with this for about a week now and need to figure out a solution.  Heck,
> this may not even be the best approach to hierarchical structures, but it
> sure seemed reasonable when I first read up on the subject.  Anyway...
>
> I created a category table like so (I got the idea from a website somewhere
> that used it in a different way, and did not discuss much about addressing
> what I want to accomplish):
> cat_id(serial)  |  cat_name(varchar)  |  parent_id(int)  |
> lineage(varchar)  |  deep(int)
> 1 root_cat_a Null
> 1  1
> 2 sub_cat_1 1
> 1-2   2
> 3 sub_sub_cat_a  2
> 1-2-3 3
>
> I use this to categorize transactions, and use the most appropriate
> subcategory for any given transation item in a table called trans_details.
> I can easily show transaction amounts by sub-category  (SELECT cat_name,
> sum(amount) AS "amount" FROM category, trans_details WHERE category_cat_id =
> trans_details.cat_id):
>
> cat_name |amount
> Transportation: Auto: Fuel |  $100
> Transportation: Auto: Maint|  $150
> Transportation: Fares: Bus|  $40
>
> but what I cannot figure out is how to create a summary where I show
> cat_name|amount
> Transportation: Auto |  $250
>
> or, what I *really* want:
> cat_name|amount
> Transportation |  $290
>
>
> Can anyone help me work through this?  Frankly, I'm not even sure where to
> begin to solve the problem.  I have been trying the WITH RECURSIVE feature,
> but I do not understand very well how to apply it.  The example query I have
> brings up an empty result set and I don't have a clue how I could modify it
> to make it work.  I'm not even sure it's the best method, but it seems like
> a reasonable approach.  I thought about using regexp to try and match the
> initial part of the lineage to the category_id of the parents, something
> like:
> WHERE c1.category_id = regexp_matches(c2.lineage, 'nnn-'), but lineage is a
> text type column, rather than an int, and would need to be cast.
>
> One of the problems I encounter is that a root category with no
> sub-categories (naturally) won't show up in the category_id = parent_id
> matches, since such a category has no children.
>


http://old.storytotell.org/blog/2009/08/11/postgresql84-recursive-queries.html

That link appears to answer all of your potential questions - how to render
the hierarchy, how to find all children of a node, etc.

Re: [SQL] Calculating relative time/distance from CURRENT_TIMESTAMP

2011-08-04 Thread Samuel Gendler

On Thu, Aug 4, 2011 at 4:42 AM, Gnanakumar  wrote:

>  Hi Amitabh,
>
> ** **
>
> Yes, I already took a glance of the Date/Time functions.  But what I’m
> expecting is something more meaningful/user-friendly value to be returned
> than from the actual return value of these functions available here.  I’m
> just finding out whether this could be solved at query level itself.
>
> ** **
>
> For example, there is age(timestamp, timestamp) function.  If I call like
> this select age(current_timestamp, (current_timestamp - interval '1 day'))it 
> returns
> 
>
> “1 day”.  But what I’m expecting to be returned from the function is
> something more meaningful/user-friendly to the end users.  In this case, I
> wanted it to return “yesterday”, similarly today, 15 minutes ago, 1 week
> ago, etc.
>

If you really want a flexible function for doing this, I suspect you could
replicate the functionality of jquery's 'timeago' module in a stored proc
relatively easily. http://timeago.yarp.com/ It is MIT licensed, so you can
copy the logic without restriction.  It makes reference to being derivative
of a ruby project, so you could probably grab that code and convert it to
ruby-pg very easily.

Re: [SQL] group by with sum and sum till max date

2011-07-06 Thread Samuel Gendler

On Tue, Jul 5, 2011 at 10:42 AM, M. D.  wrote:

> This is a little hard to explain, and I'm not sure if it's possible, but
> here goes.
>
> This is my query:
> select year, month,
> (select number from account where account.account_id =
> view_account_change.account_**id) as number,
> (select name from account where account.account_id =
> view_account_change.account_**id) as account,
> sum(amount) as amount
> from view_account_change
> where view_account_change.change_**date >= '2010-01-01'
> group by year,month,  number, account
> order by year,month, number, account
>
> I want to make an exception for the sum so that if the account number is
> less than 4000, I want a sum of all transactions until the last date of the
> group by.
>
> the query for that would be:
> Select sum(amount) from view_account_change where change_date > "max date
> in the group"
>

I think you are looking for a window function, but I'm not sure about using
a value computed over a window in the where clause. You may have to do
something somewhat complicated with a subquery, but you can definitely
compute 'max date in the group' via a window function:

http://www.postgresql.org/docs/9.0/static/tutorial-window.html
http://www.postgresql.org/docs/9.0/static/functions-window.html
http://www.postgresql.org/docs/9.0/static/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS

You'll like have to split it into a UNION of 2 queries, one for account
numbers lower than 4000 and the other for the rest.

Perhaps select max date in the group in a subquery which you then join to in
an outer query.  That should be enough to start experimenting with, anyway.

Re: [SQL] interesting sequence

2011-07-05 Thread Samuel Gendler

On Tue, Jul 5, 2011 at 2:42 PM, John Fabiani  wrote:

> On Tuesday, July 05, 2011 01:11:11 pm Kevin Crain wrote:
> > You don't need a loop there.  Assuming your order id field is of type
> > varchar you can just build the first part of your string and then do a
> > count to get the last part using a LIKE comparison:
> >
> > select count(id_order) + 1  from sometable WHERE id_order LIKE
> 'O-20110704
> > -%';
> >
> > If you do this inside a function it will be like running it in a
> > transaction so you shouldn't have to worry about it being a multi-user
> > system.
> >
> >
> >
>
> I like this - looks better than what I'm currently doing.  Thanks
> Johnf
>
>
It is simpler, but it will result in id collision if two inserts runs at the
same time, particularly if the count query takes a while to run, so be
prepared to handle that.  Make sure you have an index which can satisfy that
count query quickly.  If you are not using the C locale for your database,
that means you must create an index on that column that uses
text_pattern_ops or varchar_pattern_ops (depending on if it is text or
varchar column) so that postgresql can use the index for that comparison,
otherwise LIKE clauses will force a sequential scan of the whole table every
time.  C locale does byte by byte text comparison, so the special index
isn't required.

http://www.postgresql.org/docs/9.0/interactive/indexes-opclass.html

Re: [SQL] interesting sequence

2011-07-05 Thread Samuel Gendler

On Tue, Jul 5, 2011 at 6:52 AM, John Fabiani  wrote:

> Hi,
>
> I have a special need to create a sequence like function.
>
> "O-20110704 -2"  which is
> "O" for order (there are other types)
> "20110704" is for July 4, 2011
> '2' the second order of the day for July 4, 2011
>
> I of course can get the type and date.  What I don't know is how to get is
> the
> last number.  It would seem to be that I would need a loop to determine if
> the
> next number existed.
>
> LOOP
> --Check to see if the string exist in a table
> -- count = count +1
> -- until I don't find the string
> END LOOP;
>
> but then I thought I could do something like
>
> for $1 in  (select string from sometable)
>  LOOP
>  count = count + 1
>
> or something like this
>
> for i in 1..999 LOOP
>  -- check for the existence of the string in a table using 'i'
> -- there will never be 999 orders in one day.
> END LOOP
>
>
> So here is the question what would be the best way for a multi-user system?
> If someone has a better thought - it would be helpful.
>
> BTW I did NOT design the number - in fact it seems silly to me.
>

I'd probably do the following.  Create a table to hold the current date as a
string appropriate for use in ids.  I'd also create a sequence for each of
the id types. I'd set up a cron job (or equivalent) to run at midnight which
updates the date and resets all of the sequences to 1 within a transaction.
 You can probably do all of it in a single query.

Then I'd do inserts which generate the id by concatenating the type initial
with the date and a sequence, probably in an insert trigger on the table if
you are ok with server generated ids. Otherwise, you could do insert with a
subquery which generates the id:

insert into order (id, x, y) values ((select 'O-' || d.datestring || '-' ||
nextval('order_id_sequence') from date_table d), 'x_value', 'y_value');

If you are using hibernate or some other ORM, you can surely use an insert
trigger to generate the id and tell the ORM to use a server generated id.

sequence documentation is here:
http://www.postgresql.org/docs/8.1/static/functions-sequence.html

Re: [SQL] a strange order by behavior

2011-06-22 Thread Samuel Gendler

On Wed, Jun 22, 2011 at 8:57 PM, Pavel Stehule wrote:

> 2011/6/22 Peter Eisentraut :
> > On ons, 2011-06-22 at 02:39 -0700, Samuel Gendler wrote:
> >> Pavel suggested using a collation of ucs_basic, but I get an error
> >> when I
> >> try that on linux:
> >>
> >> $ createdb -U u1 --lc-collate=ucs_basic -E UTF-8 test
> >> createdb: database creation failed: ERROR:  invalid locale name
> ucs_basic
> >
> > ucs_basic is a collation name, which is an SQL object.  The argument of
> > createdb --lc-collate is an operating system locale name.  You can't mix
> > the two, even though they are similar.
> >
>
> ok, what I can to select, when I would to use a C like default order?
>
>
You can specify 'order by f1 using ~<~' or 'order by f1 using ~>~' in your
query.  If you want your queries to use an index when ordering or comparing
strings via LIKE, then you need to create a separate index that uses
varchar_pattern_ops instead of the default varchar_ops.  The db will
automatically use the varchar_pattern_ops index when appropriate.
 varchar_pattern_ops uses C-like collation.

create index t1_f1_pattern on t1 (f1 varchar_pattern_ops)

Re: [SQL] a strange order by behavior

2011-06-22 Thread Samuel Gendler

On Wed, Jun 22, 2011 at 3:39 AM, Samuel Gendler
wrote:

>
>
> On Wed, Jun 22, 2011 at 3:01 AM, Pavel Stehule wrote:
>
>> Hello Peter
>>
>>
>> > Pavel suggested using a collation of ucs_basic, but I get an error when
>> I
>> > try that on linux:
>> > $ createdb -U u1 --lc-collate=ucs_basic -E UTF-8 test
>> > createdb: database creation failed: ERROR:  invalid locale name
>> ucs_basic
>>
>> isn't this a bug in collations?
>>
>
> The more I read about this, the more this would appear to be the case. It
> looks like the SQL standard has some baseline collations that are required
> and it isn't at all clear how one would access those in postgres if the host
> in question doesn't have those locale's defined on the host. UCS_BASIC is a
> SQL collation, but doesn't appear to have an explicit definition on a
> 'standard' linux host (CentOS 5, in my case).  There is another SQL
> collation called 'UNICODE' which is supposed to obey the Unicode Collation
> Algorithm with the Default Unicode Collation Element Table defined in
> Unicode10.  It looks like that collation is relatively sensitive to
> language-specific sort orders, though it isn't a required collation in the
> sql standard.  I suspect that it is the UNICODE collation which actually
> would appear to be the most 'sensible' within the context of this discussion
> - characters in expected order, spaces honoured, case sensitive. I have so
> little experience with localization that I'm not sure if I'm reading this
> all correctly, though.
>
>
>

My final bit of research before going to bed -

If you are stuck with UTF-8 collation, you can do C-style collation in order
by clauses like this:

select * from t1 order by f1 using ~<~
select * from t1 order by f1 using ~>~

If you want it to be able to use an index for that (or you want to use LIKE
in your where clause with that column), then you need to have an index which
uses C-style collation.  You can create an index on the column using
text_pattern_ops or varchar_pattern_ops and then LIKE, <, and > comparisons
will use that index and do comparisons like C collation, as will order by
clauses with ~>~ and ~<~

create index t1_f1_pattern ON t1 (f1 varchar_pattern_ops);
create index t1_f1 ON t1 (f1);

If you have both of those indexes, postgresql will, apparently, do the
correct thing based on which operations you ask of it in your query.  The
pattern_ops thing is mentioned in the docs (but on a page about indexes
rather than on a page about varchar and text types) and I found the ~>~
operator in an ORDER BY clause in an old mailing list email.  It may be in
the docs, I'm not sure.  Is there a documentation page or wiki page
consisting of 'stuff that may surprise you in postgresql?'  Such a page
would be a handy place to point this kind of thing out to new users.  Most
users, even relatively experienced ones, probably haven't read the
documentation cover to cover.  I certainly hadn't.

9.1 allows collation to be specified on a per-column basis, but since all of
the utf-8 collations ignore punctuation and space characters, you'll still
have to do C-style collation to get alphabetical order that honors
whitespace, punctuation, and case, but then non-ascii characters will be in
the wrong order. So you can either have language sensitive sorting or
punctuation- and case- sensitive sorting, but never both - at least not
without defining a custom locale. Punting this defect to the OS and saying
it has to be fixed there seems like a somewhat weak answer - at least
without offering up a source for alternative locale definitions that do
sorting according to UNICODE collation.

I'm actually surprised that european users aren't complaining about this all
the time, but maybe european users are used to seeing things ordered in a
manner which doesn't honour the 'correct' ordering of accented characters.
 Actually, I wonder if the probable explanation of the lack of complaint is
the fact that the kinds of fields most apps would tend to do alphabetical
sorts on probably don't tend to have lots of punctuation other than spaces,
so perhaps the language sensitive sorts are deemed sufficient because most
people don't notice the funky behaviour with punctuation and whitespace
while case-insensitive sort is probably desired most of the time.

Re: [SQL] a strange order by behavior

2011-06-22 Thread Samuel Gendler

On Wed, Jun 22, 2011 at 3:01 AM, Pavel Stehule wrote:

> Hello Peter
>
>
> > Pavel suggested using a collation of ucs_basic, but I get an error when I
> > try that on linux:
> > $ createdb -U u1 --lc-collate=ucs_basic -E UTF-8 test
> > createdb: database creation failed: ERROR:  invalid locale name ucs_basic
>
> isn't this a bug in collations?
>

The more I read about this, the more this would appear to be the case. It
looks like the SQL standard has some baseline collations that are required
and it isn't at all clear how one would access those in postgres if the host
in question doesn't have those locale's defined on the host. UCS_BASIC is a
SQL collation, but doesn't appear to have an explicit definition on a
'standard' linux host (CentOS 5, in my case).  There is another SQL
collation called 'UNICODE' which is supposed to obey the Unicode Collation
Algorithm with the Default Unicode Collation Element Table defined in
Unicode10.  It looks like that collation is relatively sensitive to
language-specific sort orders, though it isn't a required collation in the
sql standard.  I suspect that it is the UNICODE collation which actually
would appear to be the most 'sensible' within the context of this discussion
- characters in expected order, spaces honoured, case sensitive. I have so
little experience with localization that I'm not sure if I'm reading this
all correctly, though.

Re: [SQL] a strange order by behavior

2011-06-22 Thread Samuel Gendler

On Wed, Jun 22, 2011 at 2:39 AM, Samuel Gendler
wrote:

>
> I was able to create the db with --lc_collate=C and get case-sensitive
> sorting that treats spaces 'correctly,' but I have no idea how reliable that
> is with multibyte characters and it almost certainly doesn't handle accented
> characters correctly in languages that have a mix of ascii and non-ascii
> characters, since the non-ascii chars will all sort as greater than the
> ascii chars, which is probably not how things are alphabetized in those
> languages.
>
>
'locale -a' at a command prompt will list all locale's available on a host.
I guess there is no universal utf-8 collation, which isn't that surprising,
given how utf-8 works, though I had hoped that there might be one that at
least attempts to handle all languages that share character codes in some
kind of sensible manner.  Perhaps the details of UTF-8 encoding make that
effectively impossible.  But it looks as though if you want to handle
accented characters, you would need to set collation to something like
fr_FR.UTF-8, though it looks like all of the UTF-8 collations ignore spaces
and are case-insensitive.  I don't know of a way to determine what the
sorting behaviour of an arbitrary collation might be other than
experimentation.  You can set LC_ALL to the locale in question and pipe a
test file through the sort command to do a quick test:

LC_ALL=en_US.UTF-8 sort < test.txt

I checked variants like .UTF-8, .iso88591, .iso885915@euro and they
all ignore spaces in all of the languages I checked.  It appears that
the only way to get a space-sensitive sort order is to use C
collation.  Same goes for capitalization, I think.  Which is pretty
ugly, if you ask me.

It looks like it is theoretically possible to modify a locale's
collation behaviour.  On my linux system, all locale's with the
space-ignoring, capital-ignoring collation have this entry in the
locale definitions in /usr/share/i18n/locales:

LC_COLLATE

 % Copy the template from ISO/IEC 14651

copy "iso14651_t1"

 END LC_COLLATE

Looking at the iso14651_t1 file, I see this:

# Any character not precisely specified will be considered as a special

# character and considered only at the last level.

# .. IGNORE;IGNORE;IGNORE;..

#

# SYMB.N° GLY

#

 IGNORE;IGNORE;IGNORE; # 32 

 IGNORE;IGNORE;IGNORE; # 33 _

 IGNORE;IGNORE;IGNORE; # 34 <"_>

 IGNORE;IGNORE;IGNORE; # 35 - (MACRON)

 IGNORE;IGNORE;IGNORE; # 36 

 IGNORE;IGNORE;IGNORE; # 37 -

 IGNORE;IGNORE;IGNORE; # 38 ,

 IGNORE;IGNORE;IGNORE; # 39 ;

 IGNORE;IGNORE;IGNORE; # 40 :

There doesn't seem to be any obvious discussion of how to structure the
LC_COLLATE section of a locale definition, but it is clear that telling it
to ignore spaces and punctuation is the problem we are seeing.  That seems
like a very strange choice to make as the default for all UTF-8 locales.  I
can't find the locale definitions for OS X, which might reveal a more
sensible LC_COLLATE or may just be defaulting back to C collation for
everything.

Re: [SQL] a strange order by behavior

2011-06-22 Thread Samuel Gendler

Interesting.  The original thread to which I was referring has a subject of
"Sorting Issue" and the original request showed a list of vehicle model
names which were sorting as though there were no spaces. The user had
collation set to en_US.UTF-8.  However, my database (on OS X) sorts both his
example and the example that started this thread correctly, despite my
lc_collate being set to the same value.

Then I just ran the exact same test on a Centos 5 linux host (postgresql 8.4
as well) and I am seeing the 'erroneous' sort order that prompted both this
thread and the other.  So you can't even assume the same behaviour for the
same collation on different platforms.

On OS X:

# \l
   Name|   Owner| Encoding |  Collation  |Ctype|
Access privileges
---++--+-+-+---
 col_test  | u1 | UTF8 | en_US.UTF-8 | en_US.UTF-8 |

# show lc_collate;
 lc_collate
-
 en_US.UTF-8

# select * from t1 order by f1;
 id |f1
+---
  1 | CX Hatchback
  2 | CX Minivan
  2 | CX Plus Minivan
  2 | CX Sedan
  2 | CX Sport Utility
  2 | CXL Minivan
  2 | CXL Premium Sedan
  2 | CXL Sedan
  2 | CXL Sport Utility
  2 | CXL Turbo Sedan
  2 | CXS Sedan

On CentOS 5:

# \l
   Name|   Owner| Encoding |  Collation  |Ctype|
Access privileges
---++--+-+-+---
 col_test  | u1 | UTF8 | en_US.UTF-8 | en_US.UTF-8 |

# show lc_collate
ecorithm-# ;
 lc_collate
-
 en_US.UTF-8

# select * from t1 order by f1;
 id |f1
+---
  1 | CX Hatchback
  2 | CXL Minivan
  2 | CXL Premium Sedan
  2 | CXL Sedan
  2 | CXL Sport Utility
  2 | CXL Turbo Sedan
  2 | CX Minivan
  2 | CX Plus Minivan
  2 | CX Sedan
  2 | CX Sport Utility
  2 | CXS Sedan

Further testing would seem to reveal that OS X is using something resembling
C collation order, despite the fact that it says en_US.UTF-8. I say this
because it is also case sensitive.  CentOS, on the other hand, is ignoring
spaces, but is also case-insensitive.

# select * from t1 order by f1 asc;
 id |f1
+---
  1 | CX Hatchback
  2 | CXL Minivan
  2 | CXL Premium Sedan
  2 | CXL Sedan
  2 | CXL Sport Utility
  2 | CXL Turbo Sedan
  2 | CX Minivan
  2 | CX Plus Minivan
  2 | cx Sedan
  2 | CX Sedan
  2 | CX Sport Utility
  2 | CXS Sedan

Pavel suggested using a collation of ucs_basic, but I get an error when I
try that on linux:

$ createdb -U u1 --lc-collate=ucs_basic -E UTF-8 test
createdb: database creation failed: ERROR:  invalid locale name ucs_basic

I was able to create the db with --lc_collate=C and get case-sensitive
sorting that treats spaces 'correctly,' but I have no idea how reliable that
is with multibyte characters and it almost certainly doesn't handle accented
characters correctly in languages that have a mix of ascii and non-ascii
characters, since the non-ascii chars will all sort as greater than the
ascii chars, which is probably not how things are alphabetized in those
languages.

I happen to have my computer set so I can type cyrillic and it appears to
work correctly with non-ascii chars when the entire charset is non-ascii,
but it'd be a lot of effort to construct a test that utilizes a mix of
1,2,3, and 4 byte characters, since I'm not familiar with enough alphabets.

test=# select * from t1 order by f1;
 id |f1
+---
  1 | CX Hatchback
  2 | CX Minivan
  2 | CX Plus Minivan
  2 | CX Sedan
  2 | CX Sport Utility
  2 | CXL Minivan
  2 | CXL Premium Sedan
  2 | CXL Sedan
  2 | CXL Sport Utility
  2 | CXL Turbo Sedan
  2 | CXS Sedan
  2 | cx Sedan
  2 | АИВЕТ
  2 | П АИВЕТ
  2 | П РИВЕТ
  2 | П СИВЕТ
  2 | ПРИВЕТ
  2 | РИВЕТ
  2 | СИВЕТ
  2 | привет

Re: [SQL] a strange order by behavior

2011-06-22 Thread Samuel Gendler

On Tue, Jun 21, 2011 at 3:37 AM, Eyal Wilde  wrote:

> the database collation is: en_US.UTF-8
>
> drop table t1;
> create table t1 (recid int ,f1 varchar(20));
> insert into t1 values (1,'a');
> insert into t1 values (2,' ');
> insert into t1 values (3,'aa');
> insert into t1 values (4,' a');
> select * from t1 order by f1
>
> result:
>
> recid  f1
> 2  " "
> 1  "a"-- 2 comes before 1 because space is smaller then 'a'.
> fine.
> 4  " a"   -- now is see that 1 comes before 4 because space is
> greater then 'a' !?
> 3  "aa"   -- now again, 4 comes before 3 because space is smaller
> the 'a' !?!
>
>
I seem to recall a thread here about it ignoring spaces entirely in that
collation (and maybe ignoring capitalization, too?).  I went to go test that
assertion by initializing a database with C collation and got some complaint
about it being incompatible with my template1 template database.  I
initialized a db off of template0 and then got side tracked and you've only
just reminded me of it.  I was planning to test whether it is safe to use
UTF-8 for encoding but use C collation, and then maybe investigate other
collations.

This worked:

createdb  -E UTF-8 --lc-collate=C some_db

so it should be easy enough to play around with it some.  I'm not sure how
to get a list of valid collations for any given charset, and it seems like C
collation would generate somewhat strange results with non-ascii characters
(french accented characters are supposed to alphabetize in some unexpected
manner, I believe), so there must be something better - closer to UTF-8
collation but without ignoring whitespace and such.  A quick google search
reveals that there is some kind of standard for unicode collation (
http://www.unicode.org/reports/tr10/ ) and I have no idea if that is what is
represented by the en_US.UTF-8 collation or not.  I've got no real
experience with this stuff.

It appears that there are differences regarding collation in recent versions
- the beta docs for 9.1 show that you can set collation on individual
operations or differently for different columns (
http://www.postgresql.org/docs/9.1/static/collation.html ).  I see nothing
like that in 8.4 docs.

It definitely looks like we both need to have a read of the localization
chapter of the docs for our database version as there is a bunch of stuff in
there that I was surprised to read when I just did a quick scan - like using
anything but C or posix is much slower and can produce incorrect results in
a 'like' query

It looks like the docs prior to 9.1beta have no real reference to collation
at all, so it's down to trial and error unless someone in the know speaks
up.

--sam

Re: [SQL] Append n Rows into a Single Row

2011-06-18 Thread Samuel Gendler

Actually, you need the array_agg() function to aggregate multiple rows into
a single array, and that is discussed on the aggregate functions page,
here: http://www.postgresql.org/docs/9.0/static/functions-aggregate.html

You could then use one of the other array functions to manipulate that
array.

But given that you are appending strings to each other in your example, you
can also just use the string_agg function from that same page - assuming you
are on 9.0.  It looks like that function isn't available before 9.0

--sam



On Sat, Jun 18, 2011 at 8:33 PM, Rob Sargent  wrote:

> Look at the array aggregation functions here  docs/9.0/static/functions-**array.html
> >
>
>
> Tripura wrote:
>
>> Hi,
>> I am totally new to PostgreSQL and this is my first script.
>> Can anyone please help me with my following requirement
>> I have script that returns  1 column and 40 rows.
>> My requirement is update this existing script to show all the 40 rows data
>> in a single row as a string.
>>
>> For example The current script returns, something like this Tom
>> Tad
>> Max
>> Ted
>>
>> I have to update the existing script to show the above results as follow
>> Tom || Tad || Max || Ted
>> Something like colace in Ms sql
>>
>> Any help is really appriciated.
>>
>> Thankyou in Advance.
>>
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context: http://postgresql.1045698.n5.**
>> nabble.com/Append-n-Rows-into-**a-Single-Row-**tp4500306p4500306.html
>> Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
>>
>>
>>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/**mailpref/pgsql-sql
>

Re: [SQL] Append n Rows into a Single Row

2011-06-18 Thread Samuel Gendler

Look at the documentation for the tablefunc contrib module. It provides a 
crosstab() function which will allow you to pivot multiple rows into multiple 
columns in a single row, with all rows that share a particular key in one or 
more columns pivoting up into the same row, but different keys ending up as 
separate rows. You could then string concatenate all columns of a single row 
into a single string if desired,

Sent from my iPhone

On Jun 17, 2011, at 17:20, Tripura  wrote:

> Hi,
> I am totally new to PostgreSQL and this is my first script.
> Can anyone please help me with my following requirement 
> 
> I have script that returns  1 column and 40 rows.
> My requirement is update this existing script to show all the 40 rows data
> in a single row as a string.
> 
> For example 
> The current script returns, something like this 
> Tom
> Tad
> Max
> Ted
> 
> I have to update the existing script to show the above results as follow
> Tom || Tad || Max || Ted 
> 
> Something like colace in Ms sql
> 
> Any help is really appriciated.
> 
> Thankyou in Advance.
> 
> 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://postgresql.1045698.n5.nabble.com/Append-n-Rows-into-a-Single-Row-tp4500306p4500306.html
> Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
> 
> -- 
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] finding gaps in temporal data

2011-06-15 Thread Samuel Gendler

On Wed, Jun 15, 2011 at 10:23 AM, Samuel Gendler
wrote:

> I have a fact table that looks like this:
>
> dim1_fk bigint,
> time_fk bigint,
> tstamp timestamp without timezone
> value numeric(16,2)
>
> The tstamp column is completely redundant to a colume in the time_utc
> table, but I like to leave it there for convenience when writing ad-hoc
> queries in psql - it allows me to skip the join to a time dimension table.
>  The fact table is actually partitioned into 1-month child tables, with
> check constraints on both time_fk and the tstamp column, since there is a
> 1:1 relationship between those columns.
>
> dim1_fk refers to a dim1 table which has two varchar columns we care about:
>
> dim1_pk bigint,
> label1 varchar,
> label2 varchar
>
> the time_utc table contains the usual time dimension columns, and I've
> structured the primary key to be an integer in the form MMDDHH24mm - so
> 2011-06-15 15:35 has a primary key of 201106151535 and there is a row every
> 5 minutes. All data in the fact table is assigned to a given 5 minute
> window. There is a row in the time_utc table for every possible time value,
> regardless of whether there is data in a fact table for that 5 minute
> interval.  For our purposes, we only need care about 2 columns
>
> time_pk bigint,
> tstamp timstamp without time zone
>
> I'm looking to run a report which will show me any gaps in the data for any
> label1/label2 pair that appears in the dim1 table - there are 0 or more rows
> for each label1/label2 combination in each 5 minute window and I don't
> actually care about the duplicates (all my queries aggregate multiple rows
> for a given timestamp via the avg() function).
>

OK, I figured this one out on my own.  It looks like this (plus a union all
to a query to show entries from dim1 for which there is no data at all):

SELECT d2.label1,
   d2.label2,
   tstamp - gap as start_time,
   tstamp - '5 minute'::interval as end_time
FROM (
SELECT d.dim1_pk,
   t.tstamp,
   t.tstamp - lag(t.tstamp,1, '2011-05-15 00:00:00') OVER w AS gap
FROM
dim1 d
JOIN facts.fact_tbl f
 ON f.dim1_fk = d.dim1_pk
JOIN time_utc t
 ON t.time_pk = f.time_fk
WHERE f.tstamp between '2011-05-15 00:00:00' and '2011-06-03 23:55:00'
GROUP BY 1,2
WINDOW w AS (PARTITION BY d.dim1_pk
 ORDER BY d.dim1_pk)
ORDER BY 1,2
) AS q JOIN dim1 d2 ON d2.dim1_pk = q.dim1_pk
WHERE q.gap > '5 minute'::interval
;

That subtracts the tstamp of the previous row from the tstamp of the current
row, within a window defined on individual dim1_pk values.  The outer query
then selects only rows where the gap is greater than 5 minutes, since
sequential values will show a 5 minute interval.  It also joins to dim1
again in order to pull out the label1 and label2 values, since including
those in the window instead of dim1_pk resulted in a much slower query,
presumably because I don't have indexes on the label columns.

This avoids all of the cross join and left join craziness I was doing, which
is useful when attempting to plug gaps with default values, but a pain in
the neck when just attempting to determine where the gaps are and how large
they are.

window functions are a seriously useful tool!

--sam

[SQL] finding gaps in temporal data

2011-06-15 Thread Samuel Gendler

I have a fact table that looks like this:

dim1_fk bigint,
time_fk bigint,
tstamp timestamp without timezone
value numeric(16,2)

The tstamp column is completely redundant to a colume in the time_utc table,
but I like to leave it there for convenience when writing ad-hoc queries in
psql - it allows me to skip the join to a time dimension table.  The fact
table is actually partitioned into 1-month child tables, with check
constraints on both time_fk and the tstamp column, since there is a 1:1
relationship between those columns.

dim1_fk refers to a dim1 table which has two varchar columns we care about:

dim1_pk bigint,
label1 varchar,
label2 varchar

the time_utc table contains the usual time dimension columns, and I've
structured the primary key to be an integer in the form MMDDHH24mm - so
2011-06-15 15:35 has a primary key of 201106151535 and there is a row every
5 minutes. All data in the fact table is assigned to a given 5 minute
window. There is a row in the time_utc table for every possible time value,
regardless of whether there is data in a fact table for that 5 minute
interval.  For our purposes, we only need care about 2 columns

time_pk bigint,
tstamp timstamp without time zone

I'm looking to run a report which will show me any gaps in the data for any
label1/label2 pair that appears in the dim1 table - there are 0 or more rows
for each label1/label2 combination in each 5 minute window and I don't
actually care about the duplicates (all my queries aggregate multiple rows
for a given timestamp via the avg() function).  I can find every missing row
in the fact table data by just cross joining dim1 and time_utc for a given
date range and then left joining between that and the fact table. Since
every possible time value appears in the time table and every possible
value1/value2 combination appears in the dim1 table, I will see NULLs in the
fact table columns for any row that is missing in the data set.

However, it would be far nicer to get a report which just lists gaps, with a
start and end, rather than one row for each missing 5 minute interval, even
if they are sequential.  Is there any way to accomplish this?  I'm running
8.4.x at the moment, but would be willing to upgrade to 9.0.x, if necessary.
 If I can do it without the left outer join from time_utc to the fact table,
then so much the better.

A query to grab just the missing rows looks something like this:

SELECT t.tstamp,
   d.label1,
   d.label2
FROM dimensions.dim1 d
INNER JOIN dimensions.time_utc t
   ON t.tstamp between '2011-06-01' and '2011-06-05'
LEFT OUTER JOIN facts.parent_fact f
 ON f.time_fk = t.time_pk
AND f.dim1_fk = d.dim1_pk
GROUP BY 3,2,1
HAVING avg(value) IS NULL
ORDER BY 3,2,1

so all I really need to do is figure out how to combine sequential rows into
a single row with a start and end time.  Rows are sequential if row1.tstamp
- row2.tstamp = 5 minutes

Re: [SQL] Returning a set of dates

2011-06-10 Thread Samuel Gendler

On Fri, Jun 10, 2011 at 9:28 PM, C. Bensend  wrote:

> SELECT next_bill_date( '2011-01-01', '1 month', '4 months' );
>
>
> .. I expect the following result set:
>
>
>  next_bill_date
> 
>
> 2011-07-01
> 2011-08-01
> 2011-09-01
> 2011-10-01
>
>
>

http://www.postgresql.org/docs/8.4/interactive/plpgsql-control-structures.html#PLPGSQL-STATEMENTS-RETURNING


CREATE OR REPLACE FUNCTION public.next_bill_date(d date, period interval,
  i interval)
 RETURNS SETOF date
AS $function$
DECLARE
max_date date;
due_date date;
BEGIN
max_date := CURRENT_DATE + i;
due_date := d;
WHILE due_date + period <= max_date LOOP
RETURN NEXT due_date; -- add d to the result set
due_date := due_date + period;
END LOOP;
RETURN; -- exit function
END;
$function$ language plpgsql;


testdb=# select next_bill_date('2011-06-11', '2 week', '3 month');
next_bill_date

 2011-06-11
 2011-06-25
 2011-07-09
 2011-07-23
 2011-08-06
 2011-08-20

Re: [SQL] Re: 500KB PDF files for postgresql8.3, which is the most efficient way?

2011-05-27 Thread Samuel Gendler

On Fri, May 27, 2011 at 9:15 AM, Emi Lu  wrote:

>
>  Solution:
>  
>  (1) Save pdfs to file system, only point file name in psql8.3
>
>  (2) Save oids of pdfs into table
>
>  (3) Save pdf files as bytea column in psql8.3
>
>
> Pros and cons for (1), (2), (3), which is the most efficient way?
>

You asked about the "most efficient" but you didn't tell us what you are
doing with the pdf files.  It really doesn't matter how you store it if you
are simply inserting a 500KB object 30 times per year - then it becomes all
about convenience.  I'd probably rule out solution #1 in that case purely
from a maintenance perspective.  Storing the PDFs in the db eliminates the
maintenance work of ensuring that the files are archived along with the db
backups, that file paths stay correct whenever a new db host is provisioned,
and storing the files in the db provides transaction-safe file storage.  If,
on the other hand, you are selecting/updating the rows in question several
hundred times per second or more, then you may well want to put some thought
into efficiency and, assuming some kind of web access, providing your
webserver/caching layer with direct access to the files on the filesystem
for service efficiency.  But if you are only occasionally accessing the
files in question, there's little reason not to put them in the db.

If mybatis can't load the object that references the file without also
loading the entire file from the db - and you are using that object for
other things that will require frequent loading/storing - then you probably
don't want the files in the db.  If it can load the object without loading
the binary data, by lazily loading that column only when requested, then it
doesn't matter.  Using BLOBs guarantees that you can access the entire row
without loading the binary data if mybatis exposes the blob separately from
the rest of the object, but mybatis may be capable of issuing a select
without that column and then grabbing that column as needed in order to
simulate that in the case of a bytea column, anyway.  Internally, a large
bytea column is treated similarly to a blob, with any data over a certain
size not stored in-line with the rest of the row for efficiency reasons.

Re: [SQL] problem with update data sets from front ends 8.4, Ubuntu 10.04

2011-05-23 Thread Samuel Gendler

On Mon, May 23, 2011 at 9:53 AM, Dean le Roux  wrote:

> postgres 8.3 to 8.4  Open office base 3.2 Ubuntu 9.04 to 10.04 (new
> install)
>
> I recently upgraded from Ubuntu 9.04 to Ubuntu 10.04. Security and other
> updates have been effected. I used webmin to restore the databases.
>
> Since migrating to Linux around 2006 we were successful in converting our
> mdb
> file to postgresql - only one glitch was a difference in -1 as YES. We
> successfully used the systems with open office as a front end for the last
> few
> years. Until now - after the upgrade I can view data but not update data.
>
> Experience with sql is very limited as I have always used sql query
> builders
> over the years. The system has worked well with multiple tables, and in the
> past there was no problems with sending sql statements to postgresql.
>
> My problem in OOO 3.2 base (other front ends also) is that there is
> continually a problem with not allowing queries to update data back to
> postgresql.
>
> I believe something has changed perhaps in sql from 8.3 to 8.4, or I have
> missed something with restoring the files.
>
>

At minimum, we need to see the update statement that you are attempting to
execute, preferably with any error message you receive.

Re: [SQL] ordering by date for each ID

2011-05-12 Thread Samuel Gendler

On Thu, May 12, 2011 at 1:45 PM, Thomas Kellerer  wrote:

> Nguyen,Diep T wrote on 12.05.2011 03:59:
>
>>
>> Any help would be appreciated.
>>
>
> SELECT id,
>   date,
>   score_count,
>   row_number() over (partition by id order by date desc) as order_value
> FROM your_table
>
>
>
Or the more general answer to your question is this:
http://www.postgresql.org/docs/8.4/interactive/functions-window.html
There's lots of neat things you can do with window functions.

--sam

Re: [SQL] Sorting Issue

2011-05-10 Thread Samuel Gendler

On Tue, May 10, 2011 at 9:47 AM, Tom Lane  wrote:

> "Ozer, Pam"  writes:
> > Isn't this the English standard for collation?  Or is this a non-c
> > locale as mentioned below?  Is there anyway around this?
>
> >LC_COLLATE = 'en_US.utf8'
>
> en_US is probably using somebody's idea of "dictionary order", which
> I believe includes ignoring spaces in the first pass.  You might be
> happier using "C" collation.  Unfortunately that requires re-initdb'ing
> your database (as of existing PG releases).
>

ugh. So what's the initdb incantation necessary to sort the way I'd expect
an alphabetic sort to happen?  I'm literally just in the process of bringing
up a new project, so it's a perfect opportunity for me to get this set up
correctly to begin with.  THe default on my system was definitely
en_US.utf8.

--sam

Re: [SQL] Sorting Issue

2011-05-09 Thread Samuel Gendler

On Mon, May 9, 2011 at 1:38 PM, Emi Lu  wrote:

> Hi Pam,
>
>
> >> Select VehicleTrimAbbreviated, VehicleTrimAbbreviatedId
> >> from VehicleTrimAbbreviated
> >> Where vehicleTrimAbbreviated like 'CX%'
> >> order by
> >>
> >> split_part(VehicleTrimAbbreviated, ' ', 1) asc,
> >> split_part(VehicleTrimAbbreviated, ' ', 2) asc;
>
> This query works, right?
>
> Reason:
> ==
> . split_part(VehicleTrimAbbreviated, ' ', 1) return the string before the
> blank
>
> . split_part(VehicleTrimAbbreviated, ' ', 1) return the string after the
> blank
>
> So
> [1] you order by CX, CXL, CXS first
> [2] you order by second part "Hatchback, Minivan... "
>
> Is there clear now?
>
>
Not really.  It should sort alphabetically the same in either case.  Here's
an alphabetic sort of similar strings in python

>>> a = ["CX Hatchback", "CXL Minivan", "CXL Premium Sedan", "CXL Sedan",
"CXL Sport Utility", "CXL Turbo Sedan", "CX Minivan", "CXS Sedan"]
>>> a.sort()
>>> a
['CX Hatchback', 'CX Minivan', 'CXL Minivan', 'CXL Premium Sedan', 'CXL
Sedan', 'CXL Sport Utility', 'CXL Turbo Sedan', 'CXS Sedan']

It's not at all clear why they are not coming out of the db in
alphabetically sorted order when  the query includes "order by
VehicleTrimAbbreviated asc"

Re: [SQL] plpgsql exception handling

2011-03-10 Thread Samuel Gendler

On Thu, Mar 10, 2011 at 2:53 AM, Uwe Bartels  wrote:

> same same.
> all errors including syntax_error match to others, but I checked it again.
> and the exception remains.
>

I'm just guessing here, but is it throwing a new exception in the exception
handler?  I realize that the exception that is getting caught at the top
level is the original exception, but maybe that's expected behaviour when an
exception handler throws a new exception - it re-throws the original?  What
happens if you simplify the exception handler down to just a really simple
log message?  There's really nothing else obviously wrong with the code you
provided, so I'm hoping it is a problem in the code that you didn't include
since nothing else makes sense.

I just executed this and it worked correctly - saw my error statement from
the exception handler:

CREATE OR REPLACE FUNCTION report_process(p_sql text)
  RETURNS integer AS
$BODY$
DECLARE
l_state smallint;
l_message text;
BEGIN

l_state=0;
begin
execute 'create table result_9 as '||p_sql;
exception when others then
l_state=-3;
raise notice 'error';
end;
return 0;
END;
$BODY$
LANGUAGE plpgsql VOLATILE SECURITY DEFINER
cost 100;

# select report_process('select 1count(*) from accounts'::text);
NOTICE:  error
 report_process

  0
(1 row)

Re: [SQL] plpgsql exception handling

2011-03-10 Thread Samuel Gendler

On Thu, Mar 10, 2011 at 12:45 AM, Uwe Bartels  wrote:

> Hi,
> Yes, of course is this sql producing an error.
> The point is, I want to trap the error and handle it. Here in this case I
> set the variable l_state and l_message.
>

Doh!  Of course.  Sorry about that.  What happens when you explicitly catch
the 42601 sql state or the syntax_error condition instead of 'others?'

Re: [SQL] plpgsql exception handling

2011-03-09 Thread Samuel Gendler

when I run 'select 1count(*) from table' in my postgres 8.4 installation, I
get the exact same error message.  Assuming the '1count()' function does
exist, perhaps you need to full qualify it with a schema name?

It looks to me like the query you are passing to the procedure is invalid
and is generating the error.  Perhaps the 1 in front of count(*) is a typo?

On Wed, Mar 9, 2011 at 1:55 PM, Uwe Bartels  wrote:

> Hi,
>
> I'm trying to run an execute with a dynamic sql command within a function.
> I need a clean exception handling here, but my version does not work
> somehow.
>
> I want to trap a possible syntax error and write the error code, error
> message and the sql into a table.
>
> the function looks similar to this, I cut out the unimportant pieces.
>
> CREATE OR REPLACE FUNCTION report_process(p_sql text)
>   RETURNS integer AS
> $BODY$
> DECLARE
> l_state smallint;
> l_message text;
> BEGIN
>
> l_state=0;
> begin
> execute 'create table result_'||p_id||' as '||p_sql;
> exception when others then
> l_state=-3;
> l_message:='Error executing sql sql error code: %, sql error
> message: %, sql: %',SQLSTATE,SQLERRM,l_sql;
> end;
>
> update "cache"
> set c_date=now(), c_state=l_state, c_message=l_message
> where c_id=p_id;
>
> return 0;
> END;
> $BODY$
>   LANGUAGE plpgsql VOLATILE SECURITY DEFINER
>   COST 100;
>
>
> This is the error message when I call the function
> select report_process('select 1count(*) from event_log_day'::text);
>
>
> ERROR:  syntax error at or near "("
> LINE 1: ...e table result_9 as select 1count(*) from d...
>  ^
> QUERY:  create table result_9 as select 1count(*) from event_log_day
> CONTEXT:  PL/pgSQL function "report_process" line 31 at EXECUTE statement
>
> ** Error **
>
> ERROR: syntax error at or near "("
> SQL state: 42601
> Context: PL/pgSQL function "report_process" line 31 at EXECUTE statement
>
> Any help is appreciated.
>
> best regards,
> Uwe
>

Re: [SQL] Drawing a blank on some SQL

2011-02-11 Thread Samuel Gendler

On Fri, Feb 11, 2011 at 12:47 PM, Aaron Burnett wrote:

>
>
> Thank you all very much for your help.
>
> The suggestion from Osvaldo below was the best for my situation (not having
> any soret of xref table to join)...
>
>

It may work well for now, but if that foo_activity table has the potential
to get large with only a relatively small percentage of rows fitting in the
date range, it will get very slow compared to the left join, as it requires
loading every row in the table, instead of being able to use an index to
pull just the rows from the date range and then join to the list of valid
activities.

--sam

Re: [SQL] Drawing a blank on some SQL

2011-02-11 Thread Samuel Gendler

Assuming you have a table which lists all possible activities, with one
activity per row and no duplicates, you need to do a left outer join between
activities and your query result.  That will generate a resultset that has
at least one row for every row in activities, with nulls in all the columns
coming from the query for rows that don't have a match.  Then use coalesce
to turn null into 0.  Something like this:

select a.activity_id, coalesce(q.total, 0) as total
from activities a left outer join
(select fa.activity_id, count(fa.activity_id) as total from foo_activity fa
where fa.created between '01/01/2011' and '01/08/2011'
group by 1) q on a.activity_id = q.activity_id
order by a.activity_id

If you don't have an activities table with one row per activity, just
replace the activities table in that query with another query - select
distinct activity_id from foo_activity


On Fri, Feb 11, 2011 at 10:46 AM, Aaron Burnett wrote:

>
> Hi,
>
> I'm just drawing a blank entirely today and would appreciate some help on
> this.
>
> The long and short; there are 12 distinct activities that need to be
> queried
> on a weekly basis:
>
> SELECT count(activity_id), activity_id
> FROM foo_activity
> WHERE created >= '01/01/2011' and created < '01/08/2011'
> GROUP BY 2
> ORDER BY 2;
>
> It gives me this answer, which is correct:
>
>  count | activity_id
> ---+-
>  1502 |   1
>11 |   2
> 2 |   3
>   815 |   4
>  4331 |   7
>30 |   9
>  1950 |  10
> 7 |  11
>67 |  12
>
> But what I need to see is if there are no activities for the particular
> activity_id that week, that it lists the count as 0 and lists the
> activity_id associated like this:
>
>  count | activity_id
> ---+-
>  1502 |   1
>11 |   2
> 2 |   3
>   815 |   4
> 0 |   5
> 0 |   6
>  4331 |   7
> 0 |   8
>30 |   9
>  1950 |  10
> 7 |  11
>67 |  12
>
> Thanking you in advance for any help on this. The caffiene seems to be not
> working well today.
>
> Aaron
>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>

Re: [SQL] DELETE FROM takes forever

2011-02-10 Thread Samuel Gendler

On Thu, Feb 10, 2011 at 9:57 AM, Josh  wrote:

> Hi
>
> I'm trying to do a DELETE FROM on my large table (about 800 million
> rows) based on the contents of another, moderately large table (about
> 110 million rows). The command I'm using is:
>
> DELETE FROM records WHERE id NOT IN (SELECT id FROM unique_records);
>
> This process ran for about two weeks before I decided to stop it -- it
> was dragging down the DB server. I can understand long-running
> processes, but two weeks seems a bit much even for a big table.
>
> Is this the best way to approach the problem? Is there a better way?
>

You need

delete from records r where not exists (select 1 from unique_records ur
where ur.id = r.id);

Re: [SQL] why SELECT query needs relationname (schemaName.tableName)

2011-02-07 Thread Samuel Gendler

You need to modify the search_path for that user. Read the first section of
this page very carefully:
http://www.postgresql.org/docs/8.4/static/runtime-config-client.html

The schema that is used for new objects that don't have an explicit schema
is always the first schema in the search_path.

You can permanently alter the search_path for the user like this:

alter user  set search_path to cust, other_schema, public,
pg_catalog;

There's more info on search_path here:
http://www.postgresql.org/docs/8.4/static/ddl-schemas.html

On Mon, Feb 7, 2011 at 10:56 PM, Amar Dhole  wrote:

>  Hi All,
>
>
>
> I have a query regarding database.
>
>
>
> I have database ‘customer’ which is having schema ‘cust.’
>
>
>
> I have added my tables and data into ‘cust’. But when I try to run the sql
> queries against this schema I need to add ‘cust’ as a relationName for every
> query. But same is  not applicable for public schema.
>
>
>
> Same is applicable when I query with jdbc.
>
>
>
> Can any one guide me how can I make my ‘cust’ schema to work like public
> schema means for all the sqls I don’t need to append schema name as
> relationanme.
>
>
> Thanks
>
> Amar
>

Re: [SQL] question about reg. expression

2011-01-19 Thread Samuel Gendler

I'd think you need to indicate multiple alphabetic matches.  Your first
regex actually matches only b followed by end of string and the second is
really only matching start of string followed by a.  The third is looking
for a single character string.

Try this: select 'ab' ~ '^[a-z]+$'
or this: select 'ab' ~ '^[a-z]*$'

or if looking only for 2 character strings: select 'ab' ~ '^[a-z][a-z]$'

On Tue, Jan 18, 2011 at 3:41 PM, andrew1  wrote:

> hi all,
>
> these return t:
> select 'ab' ~ '[a-z]$'
> select 'ab' ~ '^[a-z]'
>
> select 'ab' ~ '^[a-z]$' returns f
>
> Can't I use ^ and $ at the same time to match, in this case?
> thanks.
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>

Re: [SQL] pattern matching with dates?

2011-01-07 Thread Samuel Gendler

On Wed, Jan 5, 2011 at 11:39 AM, Good, Thomas  wrote:

> This dubious query worked well previously:
> select * from db_log where log_date LIKE '2011-01-%';
> (currently works on bluehost.com where they run 8.1.22)
>
> Can someone offer a (preferably ANSI compliant) way to do this on 8.4.5?
>
> I realize that >= and so on work well (which may explain why the docs
> are pretty silent about pattern matching with dates) but sometimes it's
> nice to
> treat the (ISO) date as a string.
>
>
I'd think that avoiding treating the date as a string would lend itself to
using indexes so would get much faster results, but even without indexes,
handling dates as binary types is likely faster than regex comparisons on
the column cast to a string.

I made your original query work on my 8.4.x database by just explicitly
casting the column to text like this:

select * from db_log where log_date::text LIKE '2011-01-%'


Are you looking for all rows where log_date is in january, 2011 or where
log_date is in current month or something else?  I can think of lots of
potential ways to solve this:

where log_date >= date_trunc('month', '2011-01-01'::date)
where date_trunc('month', log_date) = '2011-01-01'
where log_date >= date_trunc('month', current_date)
where date_trunc('month', log_date) = date_trunc('month', current_date)

where date_part('year', log_date) = 2011 and date_part('month', log_date) =
1
where date_part('year', log_date) = date_part('year', current_date) and
date_part('month', log_date) = date_part('month', current_date)

where log_date between x and y

where ('2011-01-01'::date, '2011-01-01'::date + interval '1 month') overlaps
(log_date, log_date)
where (current_date, current_date + interval '1 month') overlaps (log_date,
log_date)


http://www.postgresql.org/docs/8.4/static/functions-datetime.html

Re: [SQL] Database consistency after a power shortage

2010-12-15 Thread Samuel Gendler

On Wed, Dec 15, 2010 at 11:38 PM, Scott Marlowe wrote:

> On Wed, Dec 15, 2010 at 8:12 AM, Alberto  wrote:
>
> >
> > Is there any way to make the 3 operations be one transaction for the
> > database, so that it keeps them all consistent in case a power shortage
> > occurs in the middle?
>
> Yes, put them in a transaction.
>
> begin;
> insert into head_invoice ...
> insert into detail_invocie ...
> insert into payments_x_header_invoice ...
> commit;
>
> Then they either all go or none go.
>

But if the database transaction concept is new to you, I highly recommend
you do a little reading about database transactions in general and postgres'
implementation specifics as well.  It can be very easy for you to make
mistakes that can cause the database to get slow or use up a lot of disk if
you use transactions without understanding at least a little of what is
happening in the database while the transaction is open but uncommitted.

Incidentally, any error on a query within the transaction will cause the
transaction to automatically 'rollback' when the transaction completes,
undoing all of the changes, or you can manually cancel a transaction by
issuing a 'rollback;' statement instead of 'commit;' at the end.

Re: [SQL] sqlplus reporting equivalent in postgres?

2010-12-10 Thread Samuel Gendler

On Fri, Dec 10, 2010 at 7:28 PM, Gary Chambers  wrote:

> psql - not as advanced, doesn't have all the features SQL*Plus has.
>>>
>> On the other hand, it is at least capable of command history and readline
>> support.
>>
>
> Yes, but rlwrap will eliminate that limitation.
>
>
Damn!  Why didn't anyone tell me about that a couple of years ago when I had
to deal with Oracle?

Re: [SQL] sqlplus reporting equivalent in postgres?

2010-12-10 Thread Samuel Gendler

On Fri, Dec 10, 2010 at 5:44 PM, Igor Neyman  wrote:

>
>
>
> -Original Message-
> From: Tony Capobianco 
> [mailto:tcapobia...@prospectiv.com
> ]
> Sent: Thu 12/9/2010 4:47 PM
> To: pgsql-sql@postgresql.org
> Subject: sqlplus reporting equivalent in postgres?
>
> We're in the process of porting our Oracle 10.2 datawarehouse over to
> PostGres 8.4.  One thing we rely upon are daily/hourly reports that are
> run and formatted in sqlplus (break on report, compute sum, etc.).  Is
> there an equivalent available in postgres?
>
> Thanks.
> Tony
>
> psql - not as advanced, doesn't have all the features SQL*Plus has.
>
>
On the other hand, it is at least capable of command history and readline
support.

Re: [SQL] aggregation question

2010-12-01 Thread Samuel Gendler

On Tue, Nov 30, 2010 at 7:47 PM, Samuel Gendler
wrote:

> I have a fact table with a number of foreign keys to dimension data and
> some measure columns.  Let's assume I've got dimension columns a,b,c,d,e,
> and f and measure columns x, and y.
>
> I need to be able to find the value of f, x, and y for the row with
> min(x/y) when e in (1,2) and max(x/y) when e not in (1,2) with grouping by
> a,b,c, and d.
>
> A query which would identify the row in question would look like this:
>
> select min(x/y), a, b, c, d, e from my_table group by a,b,c,d,e order by
> a,b,c,d
>
> But I don't know how to go about also extracting the value of x, y, and f
> for the row which is matched - I suppose I could join to a subquery, but
> that strikes me as probably not being the most efficient way to accomplish
> it.
>
> Actually, it occurs to me that I've potentially got more than one row for
> any given a,b,c,d,e,f tuple, but I want a min for all rows aggregated by f
>
> Basically, for each (a,b,c,d,e) I want to return sum(x), sum(y), and f (as
> well as a,b,c,d,e) for the f with min(sum(x)/sum(y))
>
>
I solved this, but I'm not sure if my solution is ideal

-- select distinct causes multiple rows that tie for rank = 1

-- to condense down to a single row. Outer query also strips

-- any rows from windows with less than 3 entries.

select distinct count, total_ms, time, fk_e,
   fk_a, fk_b, fk_c
from (
-- grab count and total_ms for each fk_g, grouped by
-- a,b,c, time, and e.  Use rank() window function
-- to rank each row in the window, ordered by total_ms/count
-- and use count(*) as window function to get total rows in
-- the window.
select sum(f.x) as count, sum(f.y) as total_ms,
   f.time as time,
   f.fk_a, f.fk_b, f.fk_c,
   f.fk_e, f.fk_f,
   rank() over (w_agg order by sum(f.x)/sum(f.y)) as rank,
   count(*) over (w_agg) as rows
from fact_table f
group by 3,4,5,6,7,8
window w_agg as (partition by time,
 fk_a, fk_b, fk_c,
 fk_e)
) q
where q.rank = 1 and q.rows >= 3



I will need to run this twice - once with rank() ordered descending and the
inner query actually limited by fk_e in (4,14) and once with rank() ordered
ascending for fk_e not in (4,14).   It is possible I can put a case
statement in the inner select in order to handle that.  I haven't tried it
yet.

Is there a better solution than the outer 'select distinct' in order to only
see rows that have rank = 1 and to never have duplicate rows?

[SQL] aggregation question

2010-11-30 Thread Samuel Gendler

I have a fact table with a number of foreign keys to dimension data and some
measure columns.  Let's assume I've got dimension columns a,b,c,d,e, and f
and measure columns x, and y.

I need to be able to find the value of f, x, and y for the row with min(x/y)
when e in (1,2) and max(x/y) when e not in (1,2) with grouping by a,b,c, and
d.

A query which would identify the row in question would look like this:

select min(x/y), a, b, c, d, e from my_table group by a,b,c,d,e order by
a,b,c,d

But I don't know how to go about also extracting the value of x, y, and f
for the row which is matched - I suppose I could join to a subquery, but
that strikes me as probably not being the most efficient way to accomplish
it.

Actually, it occurs to me that I've potentially got more than one row for
any given a,b,c,d,e,f tuple, but I want a min for all rows aggregated by f

Basically, for each (a,b,c,d,e) I want to return sum(x), sum(y), and f (as
well as a,b,c,d,e) for the f with min(sum(x)/sum(y))

73 matches

Mail list logo