Re: [GENERAL] any solution for doing a data file import spawning it on multiple processes

2012-06-16 Thread h...@101-factory.eu
thanks all, i will be looking into it. 

Met vriendelijke groet,

Henk 

On 16 jun. 2012, at 18:23, Edson Richter  wrote:

> Em 16/06/2012 12:59, h...@101-factory.eu escreveu:
>> thanks i thought about splitting the file, but that did no work out well.
>> 
>> so we receive 2 files evry 30 seconds and need to import this as fast as 
>> possible.
>> 
>> we do not run java curently but maybe it's an option.
>> are you willing to share your code?
>> 
>> also i was thinking using perl for it
>> 
>> 
>> henk
>> 
>> On 16 jun. 2012, at 17:37, Edson Richter  wrote:
>> 
>>> Em 16/06/2012 12:04, h...@101-factory.eu escreveu:
>>>> hi there,
>>>> 
>>>> I am trying to import large data files into pg.
>>>> for now i used the. xarg linux command to spawn the file line for line and 
>>>> set  and use the  maximum available connections.
>>>> 
>>>> we use pg pool as connection pool to the database, and so try to maximize 
>>>> the concurrent data import of the file.
>>>> 
>>>> problem for now that it seems to work well but we miss a line once in a 
>>>> while, and that is not acceptable. also it creates zombies ;(.
>>>> 
>>>> does anybody have any other tricks that will do the job?
>>>> 
>>>> thanks,
>>>> 
>>>> Henk
>>> I've used custom Java application using connection pooling (limited to 1000 
>>> connections, mean 1000 concurrent file imports).
>>> 
>>> I'm able to import more than 64000 XML files (about 13Kb each) in 5 
>>> minutes, without memory leaks neither zombies, and (of course) no missing 
>>> records.
>>> 
>>> Besides I each thread import separate file, I have another situation where 
>>> I have separated threads importing different lines of same file. No 
>>> problems at all. Do not forget to check your OS "file open" limits (it was 
>>> a big issue in the past for me due Lucene indexes generated during import).
>>> 
>>> Server: 8 core Xeon, 16Gig, SAS 15000 rpm disks, PgSQL 9.1.3, Linux Centos 
>>> 5, Sun Java 1.6.27.
>>> 
>>> Regards,
>>> 
>>> Edson Richter
>>> 
>>> 
>>> -- 
>>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-general
> I'm not allowed to publish my company's code, but the logic if very easy to 
> understand (you will have to "invent" your own solution, below code is bare 
> bone):
> 
> class MainThread implements Runnable {
>private boolean keepRunning = true;
> 
>public void run() {
>while(keepRunning) {
>try {
>executeFiles();
>Thread.sleep(3); // sleep 30 seconds
>} catch(Exception ex) {
>ex.printStackTrace();
>}
>}
>}
> 
>private void executeFiles() {
>File monitorDir = new File("/var/mydatafolder/");
>File processingDir = new File("/var/myprocessingfolder/");
> 
>// I'll import only files with names like "data20120621.csv":
>FileFilter fileFilter = new FileFilter() {
>public boolean accept(File file) {
>boolean isfile = file.isFile() && !file.isHidden() && 
> !file.isDirectory();
>if(!isfile) return false;
>String fname = file.getName();
>return fname.startsWith("data") && 
> (file.getName().endsWith("csv"));
> }
> };
> 
>List forProcessing = monitorDir.listFiles(fileFilter);
> 
>for(File fileFound : forProcessing) {
>// FileUtil is a utility class, you will have to create your 
> own... your move method will vary according your Operating System
>FileUtil.move(fileFound, processingDir);
>// ProcessFile is a class that implements Runnable, and do your 
> stuff there...
>Thread t = new Thread(new ProcessFile(processingDir, 
> fileFound.getName()));
>t.start();
>}
>}
> 
>/** Use this method to stop the thread from another place in your complex 
> system! */
>public void synchronized stopWorker() {
>keepRunning = false;
>}
> 
>public static void main(String [] args) {
>Thread t = new Thread(new MainThread());
>t.start();
>}
> }
> 
> 
> 
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] any solution for doing a data file import spawning it on multiple processes

2012-06-16 Thread h...@101-factory.eu
thanks i thought about splitting the file, but that did no work out well. 

so we receive 2 files evry 30 seconds and need to import this as fast as 
possible. 

we do not run java curently but maybe it's an option. 
are you willing to share your code?

also i was thinking using perl for it


henk

On 16 jun. 2012, at 17:37, Edson Richter  wrote:

> Em 16/06/2012 12:04, h...@101-factory.eu escreveu:
>> hi there,
>> 
>> I am trying to import large data files into pg.
>> for now i used the. xarg linux command to spawn the file line for line and 
>> set  and use the  maximum available connections.
>> 
>> we use pg pool as connection pool to the database, and so try to maximize 
>> the concurrent data import of the file.
>> 
>> problem for now that it seems to work well but we miss a line once in a 
>> while, and that is not acceptable. also it creates zombies ;(.
>> 
>> does anybody have any other tricks that will do the job?
>> 
>> thanks,
>> 
>> Henk
> 
> I've used custom Java application using connection pooling (limited to 1000 
> connections, mean 1000 concurrent file imports).
> 
> I'm able to import more than 64000 XML files (about 13Kb each) in 5 minutes, 
> without memory leaks neither zombies, and (of course) no missing records.
> 
> Besides I each thread import separate file, I have another situation where I 
> have separated threads importing different lines of same file. No problems at 
> all. Do not forget to check your OS "file open" limits (it was a big issue in 
> the past for me due Lucene indexes generated during import).
> 
> Server: 8 core Xeon, 16Gig, SAS 15000 rpm disks, PgSQL 9.1.3, Linux Centos 5, 
> Sun Java 1.6.27.
> 
> Regards,
> 
> Edson Richter
> 
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] any solution for doing a data file import spawning it on multiple processes

2012-06-16 Thread h...@101-factory.eu
hi there,

I am trying to import large data files into pg. 
for now i used the. xarg linux command to spawn the file line for line and set  
and use the  maximum available connections. 

we use pg pool as connection pool to the database, and so try to maximize the 
concurrent data import of the file. 

problem for now that it seems to work well but we miss a line once in a while, 
and that is not acceptable. also it creates zombies ;(. 

does anybody have any other tricks that will do the job?

thanks,

Henk
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Re: how to set up automatically startup database when the server boot or reboot.

2012-04-27 Thread h...@101-factory.eu
are there any usefull startup script when eiunning in master slave setup with 
pg pool?

Henk 


On 27 apr. 2012, at 19:22, leaf_yxj  wrote:

> My os is redhat linux 5.5. And My database is greenplum 4.2.1( postgresql
> 8.2.15).  I will take a look about the init.d directory.
> 
> Thanks. Guys. Any opinion is welcome. Please help.
> 
> 
> --
> View this message in context: 
> http://postgresql.1045698.n5.nabble.com/how-to-set-up-automatically-startup-database-when-the-server-boot-or-reboot-tp5670442p5670905.html
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Valid query times out when run from bash script

2012-03-27 Thread h...@101-factory.eu
run the script with bash -v or
-vvv for extra detailed vebose logging. 

see whats wrong, most of the times a
matter using the right closure of the statements with ' or " 

Henk Bronk



On 27 mrt. 2012, at 20:37, "W. David Jarvis"  wrote:

> Hello all -
> 
> I've been trying to get a bash script set-up that creates a DB, does a
> join, writes the join to file, and then cleans itself up afterwards.
> Everything within the script runs fine when entered at an actual
> prompt, but when I run the script the SELECT query runs indefinitely.
> I'm not sure what the cause is—the script is being run on a box
> connecting to its own local postgres installation, so it can't be a
> connection issue. Any thoughts?
> 
> The bash script is included below; postgres version is 9.0.4. Any help
> very much appreciated :)
> 
> #!/bin/bash
> 
> : ${1?"ERROR: Incorrect number of arguments (files have not been
> properly specified). Proper format is 'mta-join old_mta_file
> new_mta_file desired_output_filename.'"}
> : ${2?"ERROR: Incorrect number of arguments (files have not been
> properly specified). Proper format is 'mta-join old_mta_file
> new_mta_file desired_output_filename.'"}
> : ${3?"ERROR: Incorrect number of arguments (files have not been
> properly specified). Proper format is 'mta-join old_mta_file
> new_mta_file desired_output_filename.'"}
> 
> OLD_MTA_FILE=$1
> NEW_MTA_FILE=$2
> OUTPUT_FILE=$3
> 
> # Figure out how many days we need columns for
> export NEW_MTA_COLWIDTH=`head -1 $NEW_MTA_FILE | grep ',' -o | wc -l`
> NEW_MTA_COLWIDTH=($NEW_MTA_COLWIDTH-7)/4
> 
> # Assemble the extra SQL for the above
> ADDITIONAL_CREATION_FIELDS=""
> ADDITIONAL_SELECTION_FIELDS=""
> for (( c=0; c<=$NEW_MTA_COLWIDTH; c++ ))
> do
> ADDITIONAL_CREATION_FIELDS=$ADDITIONAL_CREATION_FIELDS"DAY_"$c"
> varchar(255), event_count_"$c" numeric(20,10), conversions_"$c"
> numeric(20,10), revenue_"$c" numeric(20,10), "
> 
> ADDITIONAL_SELECTION_FIELDS=$ADDITIONAL_SELECTION_FIELDS"new_mta.DAY_"$c",
> new_mta.event_count_"$c", new_mta.conversions_"$c",
> new_mta.revenue_"$c", "
> done
> 
> # Let's get rid of that extra comma at the end.
> 
> ADDITIONAL_CREATION_FIELDS=${ADDITIONAL_CREATION_FIELDS:0:${#ADDITIONAL_CREATION_FIELDS}-2}
> ADDITIONAL_SELECTION_FIELDS=${ADDITIONAL_SELECTION_FIELDS:0:${#ADDITIONAL_SELECTION_FIELDS}-2}
> 
> echo -n "Creating database mta-join..."
> createdb mta-join
> echo -e "Done!\n"
> 
> echo "Creating table new_mta..."
> # PSQL commands start here
> psql mta-join< CREATE TABLE new_mta (
> report_date date,
> campaign_id integer,
> tracking_campaign_idinteger,
> placement_idinteger,
> creative_id integer,
> package_id  integer,
> audience_id integer,
> $ADDITIONAL_CREATION_FIELDS);
> EOF
> echo -e "Done!\n"
> 
> echo -n "Inserting new mta file into new_mta table..."
> psql mta-join< \copy new_mta from '$NEW_MTA_FILE' with delimiter ','
> \q
> EOF
> echo -e "Done!\n"
> 
> echo "Creating table old_mta..."
> # PSQL commands start here
> psql mta-join< CREATE TABLE old_mta (
> report_date_day date,
> report_date_weekdate,
> report_date_month   date,
> campaign_name   varchar(255),
> package_namevarchar(255),
> audience_name   varchar(255),
> inventory_provider_name varchar(255),
> placement_name  varchar(255),
> creative_size   varchar(255),
> creative_name   varchar(255),
> impressions bigint,
> data_cost   numeric(20,10),
> media_cost  numeric(20,10),
> gross_cost  numeric(20,10),
> clicks  integer,
> lta_click_actions   integer,
> lta_view_actionsinteger,
> lta_click_revenue   integer,
> lta_view_revenueinteger,
> mta_actions numeric(20,10),
> mta_revenue integer,
> mta_action_countinteger,
> mta_seconds integer,
> campaign_id integer,
> placement_idbigint,
> creative_id bigint,
> package_id  bigint,
> audience_id integer);
> \q
> EOF
> echo -e "Done!\n"
> 
> # Upload old MTA file into table old_mta
> echo -n "Inserting old mta file into old_mta table..."
> psql mta-join< \COPY old_mta from '$OLD_MTA_FILE' with delimiter ',' CSV header;
> EOF
> echo -e "Done!\n"
> 
> # Create a bunch of indexes
> echo -n "Creating table indexes for faster querying..."
> psql mta-join< CREATE INDEX campaign_id_index_old ON old_mta (campaign_id);
> CREATE INDEX campaign_id_index_new ON new_mta (campaign_id);
> CREATE INDEX placement_id_index_old ON old_mta (placement_id);
> CREATE INDEX placement_id_index_new ON new_mta (placement_id);
> CREATE INDEX creative_id_index_old ON old_mta (creative_id);
> CREATE INDEX creative_id_index_new ON new_mta (creative_id);
> CREATE INDEX package_id_index_old ON old_mta (package_id);
> CREATE INDEX package_id_index_new ON new_mta (package_id);
> CREATE I