[CentOS] time foo

2017-12-01 Thread hw


Hi,

isn´t this weird:


# time foo
real43m39.841s
user15m31.109s
sys 0m44.136s


Almost 30 minutes have disappeared, but it actually took about that long,
so what happened?
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] time foo

2017-12-01 Thread Stephen John Smoogen
On 1 December 2017 at 11:49, hw  wrote:
>
> Hi,
>
> isn´t this weird:
>
>
> # time foo
> real43m39.841s
> user15m31.109s
> sys 0m44.136s
>

This is counting the CPU time that a process used. If something is not
in 'CPU' but waiting on input etc it might not get counted in user or
sys. There is also the fact that the builtin bash time command you
used calculates things differently from the /usr/bin/time command.

From the /usr/bin/time man page
Note: some shells (e.g., bash(1)) have a  built-in  time  command  that
   provides less functionality than the command described here.  To access
   the real command, you may need to specify its pathname (something  like
   /usr/bin/time).

From the bash man page
   When the shell is in posix mode, time may be followed by a newline.  In
   this case, the shell displays the total user and system  time  consumed
   by  the shell and its children.  The TIMEFORMAT variable may be used to
   specify the format of the time information.

The built in time is actually meant to be used with measuring pipeline
information but can be used by itself


>
> Almost 30 minutes have disappeared, but it actually took about that long,
> so what happened?
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos



-- 
Stephen J Smoogen.
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] time foo

2017-12-01 Thread Gordon Messmer

On 12/01/2017 08:49 AM, hw wrote:

# time foo
real    43m39.841s
user    15m31.109s
sys 0m44.136s


Almost 30 minutes have disappeared, but it actually took about that long,
so what happened? 



I may misunderstand your question, but

"time" is provided by the bash shell.  It may be provided by a command 
if you are using a different shell.  When the command following the 
"time" keyword completes, bash will print the amount of elapsed time 
(the amount of time that passed between the command's start and its 
exit), the amount of time the command was using the CPU and not in a 
sleep state, and the amount of time the kernel was using the CPU to 
service requests from the command.


So your "foo" application was in a sleep state for around 30 minutes of 
the 44 minutes that passed between when you started it and when it finished.


___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] time foo

2017-12-01 Thread hw

Gordon Messmer wrote:

On 12/01/2017 08:49 AM, hw wrote:

# time foo
real43m39.841s
user15m31.109s
sys 0m44.136s


Almost 30 minutes have disappeared, but it actually took about that long,
so what happened?



I may misunderstand your question, but

"time" is provided by the bash shell.  It may be provided by a command if you are using a 
different shell.  When the command following the "time" keyword completes, bash will 
print the amount of elapsed time (the amount of time that passed between the command's start and 
its exit), the amount of time the command was using the CPU and not in a sleep state, and the 
amount of time the kernel was using the CPU to service requests from the command.

So your "foo" application was in a sleep state for around 30 minutes of the 44 
minutes that passed between when you started it and when it finished.


Hm.  Foo is a program that imports data into a database from two CVS files,
using a connection for each file and forking to import both files at once.

So this would mean that the database (running on a different server) takes
almost two times as much as foo --- which I would consider kinda excruciatingly
long because it´s merely inserting rows into two different tables after they 
were
prepared by foo and then processes some queries to convert the data.

The queries after importing may take like 3 or 5 minutes.  About 4.5 million 
rows
are being imported.

Would you consider about 20 minutes for importing as long?

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] time foo

2017-12-01 Thread Stephen John Smoogen
On 1 December 2017 at 14:32, hw  wrote:
> Gordon Messmer wrote:
>>
>> On 12/01/2017 08:49 AM, hw wrote:
>>>
>>> # time foo
>>> real43m39.841s
>>> user15m31.109s
>>> sys 0m44.136s
>>>
>>>
>>> Almost 30 minutes have disappeared, but it actually took about that long,
>>> so what happened?
>>
>>
>>
>> I may misunderstand your question, but
>>
>> "time" is provided by the bash shell.  It may be provided by a command if
>> you are using a different shell.  When the command following the "time"
>> keyword completes, bash will print the amount of elapsed time (the amount of
>> time that passed between the command's start and its exit), the amount of
>> time the command was using the CPU and not in a sleep state, and the amount
>> of time the kernel was using the CPU to service requests from the command.
>>
>> So your "foo" application was in a sleep state for around 30 minutes of
>> the 44 minutes that passed between when you started it and when it finished.
>
>
> Hm.  Foo is a program that imports data into a database from two CVS files,
> using a connection for each file and forking to import both files at once.
>
> So this would mean that the database (running on a different server) takes
> almost two times as much as foo --- which I would consider kinda
> excruciatingly
> long because it´s merely inserting rows into two different tables after they
> were
> prepared by foo and then processes some queries to convert the data.
>
> The queries after importing may take like 3 or 5 minutes.  About 4.5 million
> rows
> are being imported.
>
> Would you consider about 20 minutes for importing as long?

That depends on a lot of things.. from drive speed to drive layout to
database to network congestion to... without that information the
question is not answerable.


>
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos



-- 
Stephen J Smoogen.
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] time foo

2017-12-01 Thread Mark Haney

On 12/01/2017 02:32 PM, hw wrote:



Hm.  Foo is a program that imports data into a database from two CVS files,
using a connection for each file and forking to import both files at once.

So this would mean that the database (running on a different server) takes
almost two times as much as foo --- which I would consider kinda 
excruciatingly
long because it´s merely inserting rows into two different tables after 
they were

prepared by foo and then processes some queries to convert the data.

The queries after importing may take like 3 or 5 minutes.  About 4.5 
million rows

are being imported.

Would you consider about 20 minutes for importing as long?


There are far too many variables you've not mentioned to determine if 
that's good or bad (or very bad).  Is the connection a local connection 
(ie the import is done on the DB server) or a network connection?


What size are the CSV (CVS is a typo, correct?) files?  4.5M rows tells 
us nothing about how much data each row has.  It could be 4.5M rows of 
one INT field or 4.5M rows of a hundred fields.


I'm a bit confused by the last two sentences.  Based on how I read this:

1. Foo is prepping (creating?) the tables
2. Processes queries to convert the data (to CSV?)
3. Runs more queries on those tables.

Or it could be:

1. Foo preps the tables
2. Foo imports the CSV files
3. Foo does post-processing of the tables.

It's not really clear the actual process, but I'll go on the assumption 
that Foo is creating the tables with the correct fields, data types, 
keys and hopefully indices. Then dumps the CSV files into the tables. 
Then does post-processing.  (I've written similar scripts, so this is 
the most logical process to me.)


If we assume network bandwidth is fine, that still leaves far too many 
server variables to know if 20m is about right or not.  Amount of data 
to import, TYPE of data, database AND server configuration, CPU, RAM, 
etc and DB config for tunable paramters like buffer pool, read/write I/O 
threads, etc.


IIRC, you posted some questions about tuning a DB server a while back, 
would this be data going into that server, perhaps?


I'd like to offer a helpful suggestion when asking for list help.  It's 
better to provide TOO MUCH information, than too little.  There's a big 
difference between 'my printer won't print' and 'my printer won't print 
because it's not feeding paper properly'.



--
Mark Haney
Network Engineer at NeoNova
919-460-3330 option 1
mark.ha...@neonova.net
www.neonova.net
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] time foo

2017-12-01 Thread John R Pierce

On 12/1/2017 11:32 AM, hw wrote:
So this would mean that the database (running on a different server) 
takes
almost two times as much as foo --- which I would consider kinda 
excruciatingly
long because it´s merely inserting rows into two different tables 
after they were

prepared by foo and then processes some queries to convert the data.

The queries after importing may take like 3 or 5 minutes.  About 4.5 
million rows
are being imported. 


so you're missing about 25 minutes, and maybe 5 minutes is spent post 
processing, so thats 20 minutes spent in the data insertion?


inserting one row at a time?  or in batches?    remeber a database 
server is going to do commits after each transaction, which forces the 
data to be flushed to disk.   4.5 million seperate row transactions, 
yeah, I could see that taking some time, plus add that many network 
round trips, etcetc.   if the db server just has a single SATA disk, 
you're doing 9 million committed writes combined to the two tables?    
20 minutes for 9 million inserts, thats 7500 per second.



--
john r pierce, recycling bits in santa cruz

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] time foo

2017-12-02 Thread hw

Mark Haney wrote:

On 12/01/2017 02:32 PM, hw wrote:



Hm.  Foo is a program that imports data into a database from two CVS files,
using a connection for each file and forking to import both files at once.

So this would mean that the database (running on a different server) takes
almost two times as much as foo --- which I would consider kinda excruciatingly
long because it´s merely inserting rows into two different tables after they 
were
prepared by foo and then processes some queries to convert the data.

The queries after importing may take like 3 or 5 minutes.  About 4.5 million 
rows
are being imported.

Would you consider about 20 minutes for importing as long?


There are far too many variables you've not mentioned to determine if that's 
good or bad (or very bad).  Is the connection a local connection (ie the import 
is done on the DB server) or a network connection?


Foo is running on a different machine than the database server.


What size are the CSV (CVS is a typo, correct?) files?  4.5M rows tells us 
nothing about how much data each row has.  It could be 4.5M rows of one INT 
field or 4.5M rows of a hundred fields.


One CSV is 70745427, the other one is 536302424 bytes (68M and 512M).
That´s 18 and 23 fields or so to insert for each row.


I'm a bit confused by the last two sentences.  Based on how I read this:

1. Foo is prepping (creating?) the tables
2. Processes queries to convert the data (to CSV?)
3. Runs more queries on those tables.

Or it could be:

1. Foo preps the tables


... deletes the part of the rows that was imported the last time.
The rows from last time are being imported again, plus new rows.

Only importing new rows would require checking every row that is
being imported to figure out if it´s already there, which may not
be so much faster as to be worthwhile, and since I usually don´t
need to wait on the import to finish, it doesn´t really matter.


2. Foo imports the CSV files
3. Foo does post-processing of the tables.


right


It's not really clear the actual process, but I'll go on the assumption that 
Foo is creating the tables with the correct fields, data types, keys and 
hopefully indices. Then dumps the CSV files into the tables. Then does 
post-processing.  (I've written similar scripts, so this is the most logical 
process to me.)

If we assume network bandwidth is fine, that still leaves far too many server 
variables to know if 20m is about right or not.  Amount of data to import, TYPE 
of data, database AND server configuration, CPU, RAM, etc and DB config for 
tunable paramters like buffer pool, read/write I/O threads, etc.


The servers are connected by 4x1GB, using LACP.


IIRC, you posted some questions about tuning a DB server a while back, would 
this be data going into that server, perhaps?


right


I'd like to offer a helpful suggestion when asking for list help.  It's better 
to provide TOO MUCH information, than too little.  There's a big difference 
between 'my printer won't print' and 'my printer won't print because it's not 
feeding paper properly'.


Of course --- what gives me to think is that it takes relatively long
for the database to insert the rows while foo converting them is
relatively fast.

Foo is written in perl.  I like to think that letting the database do
as much of the work as possible is generally a better idea than doing
things that the database could do in perl because the database is likely
to be faster --- without overdoing either because for practical reasons,
things need to be kept sufficiently simple, and unnecessary optimization
is, well, unnecessary.

Now I wonder if my general assumption is false, though foo isn´t a good
example to verify the assumption because it can´t really do anything
else but import the rows, which takes as long as it takes, and the post
processing is surprisingly fast (and brings the time that queries take which
are working with the data once it has been imported down from many hours to
a few minutes or to seconds and less because I optimized things).

So I don´t know ...  I guess 45 minutes to import 600MB of data is reasonably
fast, considering that 2.25 million rows times 40 fields yield 90 million
fields, so that´s about  fields/sec.
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] time foo

2017-12-02 Thread hw

John R Pierce wrote:

On 12/1/2017 11:32 AM, hw wrote:

So this would mean that the database (running on a different server) takes
almost two times as much as foo --- which I would consider kinda excruciatingly
long because it´s merely inserting rows into two different tables after they 
were
prepared by foo and then processes some queries to convert the data.

The queries after importing may take like 3 or 5 minutes.  About 4.5 million 
rows
are being imported.


so you're missing about 25 minutes, and maybe 5 minutes is spent post 
processing, so thats 20 minutes spent in the data insertion?


Yes, with the 15 minutes actually spent on foo spent on converting
the fields and sending them to the server, which I think is pretty
good.


inserting one row at a time?  or in batches?remeber a database server is 
going to do commits after each transaction, which forces the data to be flushed 
to disk.   4.5 million seperate row transactions, yeah, I could see that taking 
some time, plus add that many network round trips, etcetc.   if the db server 
just has a single SATA disk, you're doing 9 million committed writes combined 
to the two tables?20 minutes for 9 million inserts, thats 7500 per second.


They are inserted one row at a time, during one transaction
for each of the CSV files.  I´d have to figure out how to
insert them in batches, that might yet be faster.  I could
easily stack up 1000 rows or so and then insert them all at
once, if that´s possible.
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos