Re: [GENERAL] Index impact on update?

2017-01-04 Thread Tomas Vondra

On 01/04/2017 05:59 PM, Israel Brewster wrote:

Short version:
Do indexes impact the speed of an UPDATE, even when the indexed columns
aren't changing?



They shouldn't, as long as the updated tuple can be updated on the same 
page (8kB chunk of data). In that case we can do a HOT update for the 
row, without updating the index(es).


But as you're updating the whole table, that would require about 50% of 
all pages to be free, which is unlikely to be true. So perhaps some 
updates can proceed without touching indexes, but most can't.



Details:
I have a table containing geographical data (Latitude, longitude, and
elevation) with 406,833,705 records. The Latitude and Longitude columns
are indexed. In order to better utilize the data, I've been looking into
PostGIS, and decided I wanted to add a "Location" column with PostGIS
type "GEOGRAPHY(point)". I then tried to populate it from the existing
latitude/longitude data using the following query:

UPDATE data SET
location=ST_GeographyFromText('SRID=4326;POINT('||lng::text||'
'||lat::text||')');

I expected this update to take quite a while, since it has 406 million
rows to update, but at this point it's been over 19 hours since I
started the query, and it still hasn't completed.

I'm wondering if the presence of the indexes could be slowing things
down even though the indexed columns aren't being updated? Would I be
better off canceling the update query, dropping the indexes, and trying
again? Or is more likely that the update query is "almost" done, and it
would be better to just let it run it's course? Or is there an even
better option, such as perhaps exporting the data, adding the additional
column in a text editor, and re-importing the data with a COPY command?



As explained above, it's likely that such full-table update has to 
modify the indexes anyway, making it much more expensive. Without 
additional information it's however impossible to confirm that's what's 
causing the long update in this case - there may be other bits slowing 
it down - e.g. foreign keys checks, triggers.


CREATE TABLE AS SELECT would not pay any of those costs, of course. 
Also, if you're running with wal_level=minimal, it would not have to 
write the changes into WAL, while the regular UPDATE has to do that.



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Index impact on update?

2017-01-04 Thread Israel Brewster


> On Jan 4, 2017, at 8:08 AM, Paul Ramsey  wrote:
> 
> You'd be better off forcing the table to write in bulk with something like
> 
> CREATE TABLE mynewtable AS
> SELECT *, geography(ST_SetSRID(ST_MakePoint(lng, lat), 4326)) AS geog
> FROM myoldtable;
> 
> Then index the new table, rename, etc. Bulk update will, in addition to being 
> slow, use 2x the amount of space on disk, as all the old tuples are left 
> behind from the update until you cluster or vacuum full the table.
> 
> P
> 

Well, that was definitely way faster. Created the table with the additional 
column in about 10 minutes rather than 19+ hours. Now to see how long it takes 
to generate the indexes :-)

Thanks again!
---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---

> 
> 
> On Wed, Jan 4, 2017 at 8:59 AM, Israel Brewster  > wrote:
> Short version:
> Do indexes impact the speed of an UPDATE, even when the indexed columns 
> aren't changing?
> 
> Details:
> I have a table containing geographical data (Latitude, longitude, and 
> elevation) with 406,833,705 records. The Latitude and Longitude columns are 
> indexed. In order to better utilize the data, I've been looking into PostGIS, 
> and decided I wanted to add a "Location" column with PostGIS type 
> "GEOGRAPHY(point)". I then tried to populate it from the existing 
> latitude/longitude data using the following query:
> 
> UPDATE data SET 
> location=ST_GeographyFromText('SRID=4326;POINT('||lng::text||' 
> '||lat::text||')');
> 
> I expected this update to take quite a while, since it has 406 million rows 
> to update, but at this point it's been over 19 hours since I started the 
> query, and it still hasn't completed. 
> 
> I'm wondering if the presence of the indexes could be slowing things down 
> even though the indexed columns aren't being updated? Would I be better off 
> canceling the update query, dropping the indexes, and trying again? Or is 
> more likely that the update query is "almost" done, and it would be better to 
> just let it run it's course? Or is there an even better option, such as 
> perhaps exporting the data, adding the additional column in a text editor, 
> and re-importing the data with a COPY command?
> 
> Thanks for any feedback/advice you can offer!
> ---
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293 
> ---
> 
> 
> 
> 
> 
> 



Re: [GENERAL] Index impact on update?

2017-01-04 Thread Israel Brewster
On Jan 4, 2017, at 8:08 AM, Paul Ramsey  wrote:
> 
> You'd be better off forcing the table to write in bulk with something like
> 
> CREATE TABLE mynewtable AS
> SELECT *, geography(ST_SetSRID(ST_MakePoint(lng, lat), 4326)) AS geog
> FROM myoldtable;
> 
> Then index the new table, rename, etc. Bulk update will, in addition to being 
> slow, use 2x the amount of space on disk, as all the old tuples are left 
> behind from the update until you cluster or vacuum full the table.
> 
> P

Thanks for the suggestion, info, and MUCH nicer looking syntax (perhaps more 
efficient as well?) for populating the column. I'll give it a shot, and see how 
it goes!

---
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
---

> 
> 
> 
> On Wed, Jan 4, 2017 at 8:59 AM, Israel Brewster  > wrote:
> Short version:
> Do indexes impact the speed of an UPDATE, even when the indexed columns 
> aren't changing?
> 
> Details:
> I have a table containing geographical data (Latitude, longitude, and 
> elevation) with 406,833,705 records. The Latitude and Longitude columns are 
> indexed. In order to better utilize the data, I've been looking into PostGIS, 
> and decided I wanted to add a "Location" column with PostGIS type 
> "GEOGRAPHY(point)". I then tried to populate it from the existing 
> latitude/longitude data using the following query:
> 
> UPDATE data SET 
> location=ST_GeographyFromText('SRID=4326;POINT('||lng::text||' 
> '||lat::text||')');
> 
> I expected this update to take quite a while, since it has 406 million rows 
> to update, but at this point it's been over 19 hours since I started the 
> query, and it still hasn't completed. 
> 
> I'm wondering if the presence of the indexes could be slowing things down 
> even though the indexed columns aren't being updated? Would I be better off 
> canceling the update query, dropping the indexes, and trying again? Or is 
> more likely that the update query is "almost" done, and it would be better to 
> just let it run it's course? Or is there an even better option, such as 
> perhaps exporting the data, adding the additional column in a text editor, 
> and re-importing the data with a COPY command?
> 
> Thanks for any feedback/advice you can offer!
> ---
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293 
> ---
> 
> 
> 
> 
> 
> 



Re: [GENERAL] Index impact on update?

2017-01-04 Thread Rob Sargent



On 01/04/2017 09:59 AM, Israel Brewster wrote:

Short version:
Do indexes impact the speed of an UPDATE, even when the indexed 
columns aren't changing?


Details:
I have a table containing geographical data (Latitude, longitude, and 
elevation) with 406,833,705 records. The Latitude and Longitude 
columns are indexed. In order to better utilize the data, I've been 
looking into PostGIS, and decided I wanted to add a "Location" column 
with PostGIS type "GEOGRAPHY(point)". I then tried to populate it from 
the existing latitude/longitude data using the following query:


UPDATE data SET 
location=ST_GeographyFromText('SRID=4326;POINT('||lng::text||' 
'||lat::text||')');


I expected this update to take quite a while, since it has 406 million 
rows to update, but at this point it's been over 19 hours since I 
started the query, and it still hasn't completed.


I'm wondering if the presence of the indexes could be slowing things 
down even though the indexed columns aren't being updated? Would I be 
better off canceling the update query, dropping the indexes, and 
trying again? Or is more likely that the update query is "almost" 
done, and it would be better to just let it run it's course? Or is 
there an even better option, such as perhaps exporting the data, 
adding the additional column in a text editor, and re-importing the 
data with a COPY command?


Thanks for any feedback/advice you can offer!
I would use a "create table redo as select *, 
location=ST_GeographyFromText('SRID=4326;POINT('||lng::text||' 
'||lat::text||')') from original;" then index that and drop original.  
Or just "create table location as select 
location=ST_GeographyFromText('SRID=4326;POINT('||lng::text||' 
'||lat::text||')');" along with what ever id you have for the original 
tuple (if it's not just lat+lon) and join or view as necessary after 
indexing.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Index impact on update?

2017-01-04 Thread Rob Sargent



On 01/04/2017 09:59 AM, Israel Brewster wrote:

Short version:
Do indexes impact the speed of an UPDATE, even when the indexed 
columns aren't changing?


Details:
I have a table containing geographical data (Latitude, longitude, and 
elevation) with 406,833,705 records. The Latitude and Longitude 
columns are indexed. In order to better utilize the data, I've been 
looking into PostGIS, and decided I wanted to add a "Location" column 
with PostGIS type "GEOGRAPHY(point)". I then tried to populate it from 
the existing latitude/longitude data using the following query:


UPDATE data SET 
location=ST_GeographyFromText('SRID=4326;POINT('||lng::text||' 
'||lat::text||')');


I expected this update to take quite a while, since it has 406 million 
rows to update, but at this point it's been over 19 hours since I 
started the query, and it still hasn't completed.


I'm wondering if the presence of the indexes could be slowing things 
down even though the indexed columns aren't being updated? Would I be 
better off canceling the update query, dropping the indexes, and 
trying again? Or is more likely that the update query is "almost" 
done, and it would be better to just let it run it's course? Or is 
there an even better option, such as perhaps exporting the data, 
adding the additional column in a text editor, and re-importing the 
data with a COPY command?


Thanks for any feedback/advice you can offer!
I would use a "create table redo as select *, 
location=ST_GeographyFromText('SRID=4326;POINT('||lng::text||' 
'||lat::text||')') from original;" then index that and drop original.  
Or just "create table location as select 
location=ST_GeographyFromText('SRID=4326;POINT('||lng::text||' 
'||lat::text||')');" along with what ever id you have for the original 
tuple (if it's not just lat+lon) and join or view as necessary after 
indexing.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Index impact on update?

2017-01-04 Thread Paul Ramsey
You'd be better off forcing the table to write in bulk with something like

CREATE TABLE mynewtable AS
SELECT *, geography(ST_SetSRID(ST_MakePoint(lng, lat), 4326)) AS geog
FROM myoldtable;

Then index the new table, rename, etc. Bulk update will, in addition to
being slow, use 2x the amount of space on disk, as all the old tuples are
left behind from the update until you cluster or vacuum full the table.

P



On Wed, Jan 4, 2017 at 8:59 AM, Israel Brewster 
wrote:

> Short version:
> Do indexes impact the speed of an UPDATE, even when the indexed columns
> aren't changing?
>
> Details:
> I have a table containing geographical data (Latitude, longitude, and
> elevation) with 406,833,705 records. The Latitude and Longitude columns are
> indexed. In order to better utilize the data, I've been looking into
> PostGIS, and decided I wanted to add a "Location" column with PostGIS type
> "GEOGRAPHY(point)". I then tried to populate it from the existing
> latitude/longitude data using the following query:
>
> UPDATE data SET location=ST_GeographyFromText(
> 'SRID=4326;POINT('||lng::text||' '||lat::text||')');
>
> I expected this update to take quite a while, since it has 406 million
> rows to update, but at this point it's been over 19 hours since I started
> the query, and it still hasn't completed.
>
> I'm wondering if the presence of the indexes could be slowing things down
> even though the indexed columns aren't being updated? Would I be better off
> canceling the update query, dropping the indexes, and trying again? Or is
> more likely that the update query is "almost" done, and it would be better
> to just let it run it's course? Or is there an even better option, such as
> perhaps exporting the data, adding the additional column in a text editor,
> and re-importing the data with a COPY command?
>
> Thanks for any feedback/advice you can offer!
> ---
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293
> ---
>
>
>
>
>
>


[GENERAL] Index impact on update?

2017-01-04 Thread Israel Brewster
Short version:Do indexes impact the speed of an UPDATE, even when the indexed columns aren't changing?Details:I have a table containing geographical data (Latitude, longitude, and elevation) with 406,833,705 records. The Latitude and Longitude columns are indexed. In order to better utilize the data, I've been looking into PostGIS, and decided I wanted to add a "Location" column with PostGIS type "GEOGRAPHY(point)". I then tried to populate it from the existing latitude/longitude data using the following query:UPDATE data SET location=ST_GeographyFromText('SRID=4326;POINT('||lng::text||' '||lat::text||')');I expected this update to take quite a while, since it has 406 million rows to update, but at this point it's been over 19 hours since I started the query, and it still hasn't completed. I'm wondering if the presence of the indexes could be slowing things down even though the indexed columns aren't being updated? Would I be better off canceling the update query, dropping the indexes, and trying again? Or is more likely that the update query is "almost" done, and it would be better to just let it run it's course? Or is there an even better option, such as perhaps exporting the data, adding the additional column in a text editor, and re-importing the data with a COPY command?Thanks for any feedback/advice you can offer!
---Israel BrewsterSystems Analyst IIRavn Alaska5245 Airport Industrial RdFairbanks, AK 99709(907) 450-7293---BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD