Re: [U2] [u2] Parallel processing in Universe
A Spanner deployment is called a Universe -Original Message- From: Robert Colquhoun robert.colquh...@gmail.com To: U2 Users List u2-users@listserver.u2ug.org Sent: Tue, Oct 2, 2012 9:13 pm Subject: Re: [U2] [u2] Parallel processing in Universe On Tue, Oct 2, 2012 at 5:58 PM, Symeon Breen syme...@gmail.com wrote: However map reduce and hadoop are pretty horrible things. Even Google have moved away from it with Caffiene etc. Going OT a little, i think Google is replacing BigTable which was part of Caffeine in 2010 with Spanner now. Here is a doc about it released last month: http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/spanner-osdi2012.pdf ...amusingly they call it a Multi-Version Database, can't wait till that gets abbreviated. - Robert ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
Oracle and sql server both use map reduce internally when doing collations and totals. However they work differently to U2 in that they have one big process that runs queries from the clients. This process can then cache, multithread and map reduce. U2 is differently architected in that the client process (uv or udt process) actually do the work and the central udt processes are fairly slim. These client processes are single threaded. To do any multi threading/multi processing is part of the application rather than inherent in the database. One option is to make u2 a hadoop supported data store, you could then mapreduce across multiple instances using whatever hadoop supporting toolset you wanted. However map reduce and hadoop are pretty horrible things. Even Google have moved away from it with Caffiene etc. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: 01 October 2012 21:05 To: u2-users@listserver.u2ug.org Subject: [U2] [u2] Parallel processing in Universe What's the largest dataset in the Universe user world? In terms of number of records. I'm wondering if we have any potential for utilities that map-reduce. I suppose you would spawn phantoms but how do they communicate back to the master node? ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
On 01/10/12 22:47, Robert Houben wrote: Create an index on a dict pointing at the first character of the key, and have each phantom take two digits. (0-1, 2-3, 4-5, 6-7, 8-9) Actually, this is a very BAD way of chopping up a file into five even chunks. I'm not sure of the stats, but on any file with sequential keys, the first phantom will get the majority of the records, the second get the majority of what's left, etc etc. A lot of people make the mistake of thinking this is a good technique. I'm not even sure it works well with random numbers... Cheers, Wol ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
On 02/10/12 03:49, Ross Ferris wrote: If the file were big enough, and already had part files, then I believe that you could have a phantom process each of the individual parts. Failing that, get an SSD relatively cheap, and will give your processing a reasonable kick along!! Just be careful with an SSD. If you have a power-fail in the middle of your process this sounds just like the scenario that will trash it. As in, totally dead no recovery possible. SSDs are great, but a power fail during write can take out the controller. One dead, irrecoverable disk. And if you're hammering the i/o you are VERY vulnerable. Cheers, Wol ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
What about an striped array of SSD with a backup battery to flush the write buffer on power fail. No more dangerous (IMO) than an array of hard drives - but given the limited write times of an SSD That could be more of a danger, unless your using larger drives and not a lot of data so the drive Has lot's of area to failover to when it reaches it's write maximum. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wols Lists Sent: Tuesday, October 02, 2012 4:20 AM To: u2-users@listserver.u2ug.org Subject: Re: [U2] [u2] Parallel processing in Universe On 02/10/12 03:49, Ross Ferris wrote: If the file were big enough, and already had part files, then I believe that you could have a phantom process each of the individual parts. Failing that, get an SSD relatively cheap, and will give your processing a reasonable kick along!! Just be careful with an SSD. If you have a power-fail in the middle of your process this sounds just like the scenario that will trash it. As in, totally dead no recovery possible. SSDs are great, but a power fail during write can take out the controller. One dead, irrecoverable disk. And if you're hammering the i/o you are VERY vulnerable. Cheers, Wol ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe (Unclassified)
Only outside of U2 using UniObjects can you achieve any type of parallel activity. We have through UniObjects got 80 processes working from a single Eclipse session through the use of threads in Java. UniObjects creates individual uvapi_slave or udapi_slave for each of these processes but the system or in this case the udapi_server or uvadpi_server cannot handle as many threads as we would like. We never ran out of memory on our 8GB Windows 2008R2 Server nor did SSD 120GB drive fail to keep up with the 80 ANALYZE.FILES or the 80 RESIZE commands we were issuing on from our XLr8Resizer product within Eclipse. The only way we got this working was to set the retries to 1000 on reopening the connections. Although that number seems high it helped and get us from our previous best of 39 process to 80 process. When we have a lot of time and cannot think of anything better to do we will try for 500 process. Regards, Doug www.u2logic.com Eclipse based tools for the U2 programmer -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto: u2-users-boun...@listserver.u2ug.org] On Behalf Of HENDERSON MIKE, MR Sent: Tuesday, 2 October 2012 1:18 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe (Unclassified) I have often thought about this - mostly in an idle moment or as a displacement activity for something less amusing that I ought to be doing. ;-) First of all, Universe is already extremely parallel: there's a separate O/S thread for each TTY and for each phantom, and you can't get more parallel than that for interactive processing. So you want more parallelism for your batch processes. Different applications have different degrees of inherent parallelism. For example in utility billing systems there is frequently the concept of a group of premises - based on the old concept of a foot-borne meter reader with a 'book' of readings to get. Each 'book' can be processed independently of every other. In payroll, each employee's record can be processed independently. Other areas of commerce have different characteristics. I think that whatever unit of parallelism you settle for, you'd need three processes: a 'dispatcher' that selects records for processing and queues them into some structure for processing; a set of 'workers' that take queued work items, process them, mark them as processed and put the results in some common store; and a 'monitor' that looks for unprocessed records and indications of stuck processes, and collates the results for final output. I've seen a couple of versions of this, one for electricity billings and another for overnight batch-processing of report requests, both well over a decade ago, and neither still in use although their underlying packages are still being run. The major issue is that these days the whole entity in the general commercial world is far more likely to be I/O limited than CPU limited, and therefore introducing parallelism will be no help at all if the I/O system is already choked. Even if the system is currently CPU-limited, multi-threading may not produce much improvement without very careful design of the record locking philosophy - introducing parallelism will be no help if all the threads end up contending serially for one record lock or a small set of locks. If you want it to go faster, buy the CPU with the fastest clock you can get (not the one with the most cores), and put your database on SSD like Ross said. The Power7+ chips being announced any day now are rumoured to go to 5GHz+, maybe even more if you have half the cores on the chip disabled. Regards Mike -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Ross Ferris Sent: Tuesday, 2 October 2012 3:50 p.m. To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe If the file were big enough, and already had part files, then I believe that you could have a phantom process each of the individual parts. Failing that, get an SSD relatively cheap, and will give your processing a reasonable kick along!! Ross Ferris Stamina Software Visage Better by Design! -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Tuesday, 2 October 2012 8:47 AM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe OK - I was trying to create a 'smoother use' of the disk and 'read ahead' -- this example the disk would be chattering from the heads moving all over the place. I was trying to find a way to make this process more 'orderly' -- is there one? -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Robert Houben Sent: Monday, October 01, 2012 4:48 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel
Re: [U2] [u2] Parallel processing in Universe
On 02/10/12 15:28, George Gallen wrote: What about an striped array of SSD with a backup battery to flush the write buffer on power fail. No more dangerous (IMO) than an array of hard drives - but given the limited write times of an SSD That could be more of a danger, unless your using larger drives and not a lot of data so the drive Has lot's of area to failover to when it reaches it's write maximum. I guess a backup battery would save you. Basically, anything to prevent power dying in the middle of a write. But the striped array would probably simply mean several trashed drives instead of one. It's a known, guaranteed, this is what will kill a drive scenario, and an array would just mean more drives at risk. The place I came across a major discussion about this (I knew of the issue earlier) said that some combo of Windows, update, and a certain laptop was notorious for writing off drives. The update would flood the cache, then the laptop would suspend. Cue one dead drive and, if within warranty, one no-quibble replacement. Cheers, Wol -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wols Lists Sent: Tuesday, October 02, 2012 4:20 AM To: u2-users@listserver.u2ug.org Subject: Re: [U2] [u2] Parallel processing in Universe On 02/10/12 03:49, Ross Ferris wrote: If the file were big enough, and already had part files, then I believe that you could have a phantom process each of the individual parts. Failing that, get an SSD relatively cheap, and will give your processing a reasonable kick along!! Just be careful with an SSD. If you have a power-fail in the middle of your process this sounds just like the scenario that will trash it. As in, totally dead no recovery possible. SSDs are great, but a power fail during write can take out the controller. One dead, irrecoverable disk. And if you're hammering the i/o you are VERY vulnerable. Cheers, Wol ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe (Unclassified)
Only outside of U2 using UniObjects can you achieve any type of parallel activity. We have through UniObjects got 80 processes working from a single Eclipse session through the use of threads in Java. UniObjects creates individual uvapi_slave or udapi_slave for each of these processes but the system or in this case the udapi_server or uvadpi_server cannot handle as many threads as we would like. We never ran out of memory on our 8GB Windows 2008R2 Server nor did SSD 120GB drive fail to keep up with the 80 ANALYZE.FILES or the 80 RESIZE commands we were issuing on from our XLr8Resizer product within Eclipse. The only way we got this working was to set the retries to 1000 on reopening the connections. Although that number seems high it helped and get us from our previous best of 39 process to 80 process. When we have a lot of time and cannot think of anything better to do we will try for 500 process. Regards, Doug www.u2logic.com/tools.html Eclipse based tools for the U2 programmer ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
Yes the low numbers are used more often. However if you have sequential keys, just use the *last* two digits instead of the first two -Original Message- From: Wols Lists antli...@youngman.org.uk To: u2-users u2-users@listserver.u2ug.org Sent: Tue, Oct 2, 2012 1:17 am Subject: Re: [U2] [u2] Parallel processing in Universe On 01/10/12 22:47, Robert Houben wrote: Create an index on a dict pointing at the first character of the key, and have each phantom take two digits. (0-1, 2-3, 4-5, 6-7, 8-9) Actually, this is a very BAD way of chopping up a file into five even chunks. I'm not sure of the stats, but on any file with sequential keys, the first phantom will get the majority of the records, the second get the majority of what's left, etc etc. A lot of people make the mistake of thinking this is a good technique. I'm not even sure it works well with random numbers... Cheers, Wol ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
In my example, I would grab 'whatever' records were hashed in the to 'group' -- while it's not perfect since there are 'overflow' - was just trying to think of a way to break a file into pieces that would otherwise process much like a BASIC select - just grab the 'group' and go I can see it's probably not possible, but the topic got me thinking about 'what if'... (And we're UniData - so I have to apply that filter to most everything I read on the list anyway G) -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Taylor Sent: Monday, October 01, 2012 6:10 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe Or, let's suppose you wanted to process repetitive segments of one very large record using the same logic in a separate phantom process for each segment, how large a record can be read and processed in Universe? Dave So how would a user 'chop up' a file for parallel processing? Ideally, if here was a Mod 10001 file (or whatever) it would seem like it would be 'ideal' to assign 2000 groups to 5 phantoms -- but I don't know how 'start a BASIC select at Group 2001 or 4001' ... -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 3:29 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe 0001: OPENSEQ /tmp/pipetest TO F.PIPE ELSE STOP NO PIPE 0002: LOOP 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE 0004:PRINT LINE 0005: REPEAT 0006: STOP 0007: END Although, not sure if you might need to sleep a litte between the READSEQ's ELSE and CONTINUE Might suck up cpu time when nothing is writing to the file. Then you could setup a printer in UV that did a cat - /tmp/pipetest Now your phantom just needs to print to that printer. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 4:16 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe The only thing about a pipe is that once it's closed, I believe it has to be re-opened by both Ends again. So if point a opens one end, and point b opens the other end, once either end closes, It closes for both sides, and both sides would have to reopen again to use. To eliminate this, you could have one end open a file, and have the other sides do a append To that file - just make sure you include some kind of dataheader so the reading side knows which Process just wrote the data. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of u2ug Sent: Monday, October 01, 2012 4:11 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe pipes -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Monday, October 01, 2012 4:05 PM To: u2-users@listserver.u2ug.org Subject: [U2] [u2] Parallel processing in Universe What's the largest dataset in the Universe user world? In terms of number of records. I'm wondering if we have any potential for utilities that map-reduce. I suppose you would spawn phantoms but how do they communicate back to the master node? ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
What if you created a duplicate file, did a SELECT and saved the list (non-sorted). Each of the phantoms would do a getlist and loop through using readlist/readu and if the record were already locked, skip it until it reads An unlocked record (and locks it). Delete the record when finished. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Tuesday, October 02, 2012 11:43 AM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe In my example, I would grab 'whatever' records were hashed in the to 'group' -- while it's not perfect since there are 'overflow' - was just trying to think of a way to break a file into pieces that would otherwise process much like a BASIC select - just grab the 'group' and go I can see it's probably not possible, but the topic got me thinking about 'what if'... (And we're UniData - so I have to apply that filter to most everything I read on the list anyway G) -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Taylor Sent: Monday, October 01, 2012 6:10 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe Or, let's suppose you wanted to process repetitive segments of one very large record using the same logic in a separate phantom process for each segment, how large a record can be read and processed in Universe? Dave So how would a user 'chop up' a file for parallel processing? Ideally, if here was a Mod 10001 file (or whatever) it would seem like it would be 'ideal' to assign 2000 groups to 5 phantoms -- but I don't know how 'start a BASIC select at Group 2001 or 4001' ... -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 3:29 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe 0001: OPENSEQ /tmp/pipetest TO F.PIPE ELSE STOP NO PIPE 0002: LOOP 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE 0004:PRINT LINE 0005: REPEAT 0006: STOP 0007: END Although, not sure if you might need to sleep a litte between the READSEQ's ELSE and CONTINUE Might suck up cpu time when nothing is writing to the file. Then you could setup a printer in UV that did a cat - /tmp/pipetest Now your phantom just needs to print to that printer. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 4:16 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe The only thing about a pipe is that once it's closed, I believe it has to be re-opened by both Ends again. So if point a opens one end, and point b opens the other end, once either end closes, It closes for both sides, and both sides would have to reopen again to use. To eliminate this, you could have one end open a file, and have the other sides do a append To that file - just make sure you include some kind of dataheader so the reading side knows which Process just wrote the data. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of u2ug Sent: Monday, October 01, 2012 4:11 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe pipes -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Monday, October 01, 2012 4:05 PM To: u2-users@listserver.u2ug.org Subject: [U2] [u2] Parallel processing in Universe What's the largest dataset in the Universe user world? In terms of number of records. I'm wondering if we have any potential for utilities that map-reduce. I suppose you would spawn phantoms but how do they communicate back to the master node? ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman
Re: [U2] [u2] Parallel processing in Universe
AH - would not even have to 'delete' as long as the 'locks' are held long enough -- meaning if you know you will have 20 phantoms, each phantom would keep a list of 'keys locked' and once it hits 21 (or 40 if you want insurance LOL) in the list, would unlock earliest lock -- that way there is no way any other phantom could process anything twice... As each phantom runs, if it hits a locked record, it would move to the next item in the list. Great idea! DW -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Tuesday, October 02, 2012 10:52 AM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe What if you created a duplicate file, did a SELECT and saved the list (non-sorted). Each of the phantoms would do a getlist and loop through using readlist/readu and if the record were already locked, skip it until it reads An unlocked record (and locks it). Delete the record when finished. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Tuesday, October 02, 2012 11:43 AM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe In my example, I would grab 'whatever' records were hashed in the to 'group' -- while it's not perfect since there are 'overflow' - was just trying to think of a way to break a file into pieces that would otherwise process much like a BASIC select - just grab the 'group' and go I can see it's probably not possible, but the topic got me thinking about 'what if'... (And we're UniData - so I have to apply that filter to most everything I read on the list anyway G) -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Taylor Sent: Monday, October 01, 2012 6:10 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe Or, let's suppose you wanted to process repetitive segments of one very large record using the same logic in a separate phantom process for each segment, how large a record can be read and processed in Universe? Dave So how would a user 'chop up' a file for parallel processing? Ideally, if here was a Mod 10001 file (or whatever) it would seem like it would be 'ideal' to assign 2000 groups to 5 phantoms -- but I don't know how 'start a BASIC select at Group 2001 or 4001' ... -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 3:29 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe 0001: OPENSEQ /tmp/pipetest TO F.PIPE ELSE STOP NO PIPE 0002: LOOP 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE 0004:PRINT LINE 0005: REPEAT 0006: STOP 0007: END Although, not sure if you might need to sleep a litte between the READSEQ's ELSE and CONTINUE Might suck up cpu time when nothing is writing to the file. Then you could setup a printer in UV that did a cat - /tmp/pipetest Now your phantom just needs to print to that printer. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 4:16 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe The only thing about a pipe is that once it's closed, I believe it has to be re-opened by both Ends again. So if point a opens one end, and point b opens the other end, once either end closes, It closes for both sides, and both sides would have to reopen again to use. To eliminate this, you could have one end open a file, and have the other sides do a append To that file - just make sure you include some kind of dataheader so the reading side knows which Process just wrote the data. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of u2ug Sent: Monday, October 01, 2012 4:11 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe pipes -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Monday, October 01, 2012 4:05 PM To: u2-users@listserver.u2ug.org Subject: [U2] [u2] Parallel processing in Universe What's the largest dataset in the Universe user world? In terms of number of records. I'm wondering if we have any potential for utilities that map-reduce. I suppose you would spawn phantoms but how do they communicate back to the master node? ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
You've highlighted one problem here. By having multiple processes accessing the disk in different locations, you destroy cache optimization and seek times. More phantoms = less performance. This assumes I/O is a bigger concern than CPU, which is generally the case. More phantoms = more communication, which also adds another overhead that reduces performance. Introducing more phantoms than CPU cores, you increase the amount of context switching, which ones again hurts your cache usage as well as adding bigger overheads on the CPU again. In short, except for very specific cases, increasing 'concurrency' through phantoms on a single machine is generally ill-advised, resulting in longer processing times, higher average system loads and worse yet, greater system complexity (and hence ways for things to break). As mentioned earlier, more system-level architectural changes (such as multiple machines, or at least, files storage on different disks/spindles for each process) are required if you want benefit from this sort of work. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Monday, October 01, 2012 4:47 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe OK - I was trying to create a 'smoother use' of the disk and 'read ahead' -- this example the disk would be chattering from the heads moving all over the place. I was trying to find a way to make this process more 'orderly' -- is there one? -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Robert Houben Sent: Monday, October 01, 2012 4:48 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe Create an index on a dict pointing at the first character of the key, and have each phantom take two digits. (0-1, 2-3, 4-5, 6-7, 8-9) -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: October-01-12 2:43 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe So how would a user 'chop up' a file for parallel processing? Ideally, if here was a Mod 10001 file (or whatever) it would seem like it would be 'ideal' to assign 2000 groups to 5 phantoms -- but I don't know how 'start a BASIC select at Group 2001 or 4001' ... -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 3:29 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe 0001: OPENSEQ /tmp/pipetest TO F.PIPE ELSE STOP NO PIPE 0002: LOOP 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE 0004:PRINT LINE 0005: REPEAT 0006: STOP 0007: END Although, not sure if you might need to sleep a litte between the READSEQ's ELSE and CONTINUE Might suck up cpu time when nothing is writing to the file. Then you could setup a printer in UV that did a cat - /tmp/pipetest Now your phantom just needs to print to that printer. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 4:16 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe The only thing about a pipe is that once it's closed, I believe it has to be re-opened by both Ends again. So if point a opens one end, and point b opens the other end, once either end closes, It closes for both sides, and both sides would have to reopen again to use. To eliminate this, you could have one end open a file, and have the other sides do a append To that file - just make sure you include some kind of dataheader so the reading side knows which Process just wrote the data. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of u2ug Sent: Monday, October 01, 2012 4:11 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe pipes -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Monday, October 01, 2012 4:05 PM To: u2-users@listserver.u2ug.org Subject: [U2] [u2] Parallel processing in Universe What's the largest dataset in the Universe user world? In terms of number of records. I'm wondering if we have any potential for utilities that map-reduce. I suppose you would spawn phantoms but how do they communicate back to the master node? ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman
Re: [U2] [u2] Parallel processing in Universe
Great point!! I think we can agree that 'spinning media latency' is the enemy and having phantoms increasing the 'head dance' can make things worse, not better! Many problems go away or become trivial as the spinning media trails to the sunset. I've advised customers that just moving 'code files' to a tiny SSD would likely increase overall system performance on Windows boxes. Just waiting until the price for Enterprise SSDs makes them a no-brainer... Until then, even small SSDs will help! -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Daniel McGrath Sent: Tuesday, October 02, 2012 12:05 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe You've highlighted one problem here. By having multiple processes accessing the disk in different locations, you destroy cache optimization and seek times. More phantoms = less performance. This assumes I/O is a bigger concern than CPU, which is generally the case. More phantoms = more communication, which also adds another overhead that reduces performance. Introducing more phantoms than CPU cores, you increase the amount of context switching, which ones again hurts your cache usage as well as adding bigger overheads on the CPU again. In short, except for very specific cases, increasing 'concurrency' through phantoms on a single machine is generally ill-advised, resulting in longer processing times, higher average system loads and worse yet, greater system complexity (and hence ways for things to break). As mentioned earlier, more system-level architectural changes (such as multiple machines, or at least, files storage on different disks/spindles for each process) are required if you want benefit from this sort of work. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Monday, October 01, 2012 4:47 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe OK - I was trying to create a 'smoother use' of the disk and 'read ahead' -- this example the disk would be chattering from the heads moving all over the place. I was trying to find a way to make this process more 'orderly' -- is there one? -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Robert Houben Sent: Monday, October 01, 2012 4:48 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe Create an index on a dict pointing at the first character of the key, and have each phantom take two digits. (0-1, 2-3, 4-5, 6-7, 8-9) -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: October-01-12 2:43 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe So how would a user 'chop up' a file for parallel processing? Ideally, if here was a Mod 10001 file (or whatever) it would seem like it would be 'ideal' to assign 2000 groups to 5 phantoms -- but I don't know how 'start a BASIC select at Group 2001 or 4001' ... -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 3:29 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe 0001: OPENSEQ /tmp/pipetest TO F.PIPE ELSE STOP NO PIPE 0002: LOOP 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE 0004:PRINT LINE 0005: REPEAT 0006: STOP 0007: END Although, not sure if you might need to sleep a litte between the READSEQ's ELSE and CONTINUE Might suck up cpu time when nothing is writing to the file. Then you could setup a printer in UV that did a cat - /tmp/pipetest Now your phantom just needs to print to that printer. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 4:16 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe The only thing about a pipe is that once it's closed, I believe it has to be re-opened by both Ends again. So if point a opens one end, and point b opens the other end, once either end closes, It closes for both sides, and both sides would have to reopen again to use. To eliminate this, you could have one end open a file, and have the other sides do a append To that file - just make sure you include some kind of dataheader so the reading side knows which Process just wrote the data. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of u2ug Sent: Monday, October 01, 2012 4:11 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe pipes -Original Message- From: u2-users-boun
Re: [U2] [u2] Parallel processing in Universe
Yes, SSD will definitely help. Just keep in mind, it doesn't prevent all negatives in regards to I/O, particularly with regards to caching. Disk caching in a modern system is fairly complex, but at the high level it is not only done by the controller, but by the OS as well. So randomly flying around the disk still cause cache thrashing. :( -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Tuesday, October 02, 2012 11:19 AM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe Great point!! I think we can agree that 'spinning media latency' is the enemy and having phantoms increasing the 'head dance' can make things worse, not better! Many problems go away or become trivial as the spinning media trails to the sunset. I've advised customers that just moving 'code files' to a tiny SSD would likely increase overall system performance on Windows boxes. Just waiting until the price for Enterprise SSDs makes them a no-brainer... Until then, even small SSDs will help! -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Daniel McGrath Sent: Tuesday, October 02, 2012 12:05 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe You've highlighted one problem here. By having multiple processes accessing the disk in different locations, you destroy cache optimization and seek times. More phantoms = less performance. This assumes I/O is a bigger concern than CPU, which is generally the case. More phantoms = more communication, which also adds another overhead that reduces performance. Introducing more phantoms than CPU cores, you increase the amount of context switching, which ones again hurts your cache usage as well as adding bigger overheads on the CPU again. In short, except for very specific cases, increasing 'concurrency' through phantoms on a single machine is generally ill-advised, resulting in longer processing times, higher average system loads and worse yet, greater system complexity (and hence ways for things to break). As mentioned earlier, more system-level architectural changes (such as multiple machines, or at least, files storage on different disks/spindles for each process) are required if you want benefit from this sort of work. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Monday, October 01, 2012 4:47 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe OK - I was trying to create a 'smoother use' of the disk and 'read ahead' -- this example the disk would be chattering from the heads moving all over the place. I was trying to find a way to make this process more 'orderly' -- is there one? -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Robert Houben Sent: Monday, October 01, 2012 4:48 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe Create an index on a dict pointing at the first character of the key, and have each phantom take two digits. (0-1, 2-3, 4-5, 6-7, 8-9) -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: October-01-12 2:43 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe So how would a user 'chop up' a file for parallel processing? Ideally, if here was a Mod 10001 file (or whatever) it would seem like it would be 'ideal' to assign 2000 groups to 5 phantoms -- but I don't know how 'start a BASIC select at Group 2001 or 4001' ... -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 3:29 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe 0001: OPENSEQ /tmp/pipetest TO F.PIPE ELSE STOP NO PIPE 0002: LOOP 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE 0004:PRINT LINE 0005: REPEAT 0006: STOP 0007: END Although, not sure if you might need to sleep a litte between the READSEQ's ELSE and CONTINUE Might suck up cpu time when nothing is writing to the file. Then you could setup a printer in UV that did a cat - /tmp/pipetest Now your phantom just needs to print to that printer. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 4:16 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe The only thing about a pipe is that once it's closed, I believe it has to be re-opened by both Ends again. So if point a opens one end, and point b opens the other end, once
Re: [U2] [u2] Parallel processing in Universe
The idea of the phantoms would be to read the file in order, not randomly, just inorder from five different starting points. So you should still get the benefit of some caching. -Original Message- From: Daniel McGrath dmcgr...@rocketsoftware.com To: U2 Users List u2-users@listserver.u2ug.org Sent: Tue, Oct 2, 2012 10:32 am Subject: Re: [U2] [u2] Parallel processing in Universe Yes, SSD will definitely help. Just keep in mind, it doesn't prevent all negatives in regards to I/O, particularly with regards to caching. Disk caching in a modern system is fairly complex, but at the high level it is not only done by the controller, but by the OS as well. So randomly flying around the disk still cause cache thrashing. :( -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Tuesday, October 02, 2012 11:19 AM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe Great point!! I think we can agree that 'spinning media latency' is the enemy and having phantoms increasing the 'head dance' can make things worse, not better! Many problems go away or become trivial as the spinning media trails to the sunset. I've advised customers that just moving 'code files' to a tiny SSD would likely increase overall system performance on Windows boxes. Just waiting until the price for Enterprise SSDs makes them a no-brainer... Until then, even small SSDs will help! -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Daniel McGrath Sent: Tuesday, October 02, 2012 12:05 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe You've highlighted one problem here. By having multiple processes accessing the disk in different locations, you destroy cache optimization and seek times. More phantoms = less performance. This assumes I/O is a bigger concern than CPU, which is generally the case. More phantoms = more communication, which also adds another overhead that reduces performance. Introducing more phantoms than CPU cores, you increase the amount of context switching, which ones again hurts your cache usage as well as adding bigger overheads on the CPU again. In short, except for very specific cases, increasing 'concurrency' through phantoms on a single machine is generally ill-advised, resulting in longer processing times, higher average system loads and worse yet, greater system complexity (and hence ways for things to break). As mentioned earlier, more system-level architectural changes (such as multiple machines, or at least, files storage on different disks/spindles for each process) are required if you want benefit from this sort of work. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Monday, October 01, 2012 4:47 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe OK - I was trying to create a 'smoother use' of the disk and 'read ahead' -- this example the disk would be chattering from the heads moving all over the place. I was trying to find a way to make this process more 'orderly' -- is there one? -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Robert Houben Sent: Monday, October 01, 2012 4:48 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe Create an index on a dict pointing at the first character of the key, and have each phantom take two digits. (0-1, 2-3, 4-5, 6-7, 8-9) -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: October-01-12 2:43 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe So how would a user 'chop up' a file for parallel processing? Ideally, if here was a Mod 10001 file (or whatever) it would seem like it would be 'ideal' to assign 2000 groups to 5 phantoms -- but I don't know how 'start a BASIC select at Group 2001 or 4001' ... -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 3:29 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe 0001: OPENSEQ /tmp/pipetest TO F.PIPE ELSE STOP NO PIPE 0002: LOOP 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE 0004:PRINT LINE 0005: REPEAT 0006: STOP 0007: END Although, not sure if you might need to sleep a litte between the READSEQ's ELSE and CONTINUE Might suck up cpu time when nothing is writing to the file. Then you could setup a printer in UV that did a cat - /tmp/pipetest Now your phantom just needs to print to that printer. George -Original Message- From: u2
Re: [U2] [u2] Parallel processing in Universe
If 5 phantoms were running, and read in order but from 5 different starting points, the records would Essentially still be processed in a random order, if you were to layout the record ID's as they get Processed. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Tuesday, October 02, 2012 1:35 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] [u2] Parallel processing in Universe The idea of the phantoms would be to read the file in order, not randomly, just inorder from five different starting points. So you should still get the benefit of some caching. ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
The point of the caching concern is related to the read ahead, and you will still get some benefit from this, if your five phantoms are reading their *portion* of the file in order, which they should. -Original Message- From: George Gallen ggal...@wyanokegroup.com To: U2 Users List u2-users@listserver.u2ug.org Sent: Tue, Oct 2, 2012 10:39 am Subject: Re: [U2] [u2] Parallel processing in Universe If 5 phantoms were running, and read in order but from 5 different starting points, the records would Essentially still be processed in a random order, if you were to layout the record ID's as they get Processed. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Tuesday, October 02, 2012 1:35 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] [u2] Parallel processing in Universe The idea of the phantoms would be to read the file in order, not randomly, just inorder from five different starting points. So you should still get the benefit of some caching. ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
OK. I See what your saying...I'll buy that. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Tuesday, October 02, 2012 1:42 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] [u2] Parallel processing in Universe The point of the caching concern is related to the read ahead, and you will still get some benefit from this, if your five phantoms are reading their *portion* of the file in order, which they should. -Original Message- From: George Gallen ggal...@wyanokegroup.com To: U2 Users List u2-users@listserver.u2ug.org Sent: Tue, Oct 2, 2012 10:39 am Subject: Re: [U2] [u2] Parallel processing in Universe If 5 phantoms were running, and read in order but from 5 different starting points, the records would Essentially still be processed in a random order, if you were to layout the record ID's as they get Processed. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Tuesday, October 02, 2012 1:35 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] [u2] Parallel processing in Universe The idea of the phantoms would be to read the file in order, not randomly, just inorder from five different starting points. So you should still get the benefit of some caching. ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
Which was my question -- was there a way to 'jump to' a group or 'BASIC SELECT' with a'starting/ending' group -- so that again, 10001 moduo, one phantom does 'groups' 1-2000, next phantom does 'groups' 2001-4000 etc... But can't see that it's really possible without jumping through hoops that make it unattractive at best! At least on UniData! DW -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Tuesday, October 02, 2012 12:55 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe OK. I See what your saying...I'll buy that. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Tuesday, October 02, 2012 1:42 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] [u2] Parallel processing in Universe The point of the caching concern is related to the read ahead, and you will still get some benefit from this, if your five phantoms are reading their *portion* of the file in order, which they should. -Original Message- From: George Gallen ggal...@wyanokegroup.com To: U2 Users List u2-users@listserver.u2ug.org Sent: Tue, Oct 2, 2012 10:39 am Subject: Re: [U2] [u2] Parallel processing in Universe If 5 phantoms were running, and read in order but from 5 different starting points, the records would Essentially still be processed in a random order, if you were to layout the record ID's as they get Processed. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Tuesday, October 02, 2012 1:35 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] [u2] Parallel processing in Universe The idea of the phantoms would be to read the file in order, not randomly, just inorder from five different starting points. So you should still get the benefit of some caching. ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
You may not need to know what *group* you are in per se, if you are willing to use the file stats record. You can determine from the last stats, how many records are in your file. Then your master program just reads the keys until it gets to the 50,000th key (or whatever), and then spawns a phantom, telling it with which key to start, and how many keys to process before it ends. Or maybe you don't need the stat file if Unidata has the @SELECTED to tell you how many keys there are -Original Message- From: David Wolverton dwolv...@flash.net To: 'U2 Users List' u2-users@listserver.u2ug.org Sent: Tue, Oct 2, 2012 10:59 am Subject: Re: [U2] [u2] Parallel processing in Universe Which was my question -- was there a way to 'jump to' a group or 'BASIC SELECT' with a'starting/ending' group -- so that again, 10001 moduo, one phantom does 'groups' 1-2000, next phantom does 'groups' 2001-4000 etc... But can't see that it's really possible without jumping through hoops that make it unattractive at best! At least on UniData! DW -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Tuesday, October 02, 2012 12:55 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe OK. I See what your saying...I'll buy that. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Tuesday, October 02, 2012 1:42 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] [u2] Parallel processing in Universe The point of the caching concern is related to the read ahead, and you will still get some benefit from this, if your five phantoms are reading their *portion* of the file in order, which they should. -Original Message- From: George Gallen ggal...@wyanokegroup.com To: U2 Users List u2-users@listserver.u2ug.org Sent: Tue, Oct 2, 2012 10:39 am Subject: Re: [U2] [u2] Parallel processing in Universe If 5 phantoms were running, and read in order but from 5 different starting points, the records would Essentially still be processed in a random order, if you were to layout the record ID's as they get Processed. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Tuesday, October 02, 2012 1:35 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] [u2] Parallel processing in Universe The idea of the phantoms would be to read the file in order, not randomly, just inorder from five different starting points. So you should still get the benefit of some caching. ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
Could also avoid the lock contention if each phantom had knowledge of the others, so phantom 1 could only process @ID 1, 6, 11 etc., phantom 2 would do 2,7,12 so on Of course, if you are operating with a select list, this already implies that you have processed the file once, so your batch process is actually a re-read, so in the absence of a suitable index, perhaps employing the Drumheller trick would be worth consideration Ross Ferris Stamina Software Visage Better by Design! -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Wednesday, 3 October 2012 3:42 AM To: u2-users@listserver.u2ug.org Subject: Re: [U2] [u2] Parallel processing in Universe The point of the caching concern is related to the read ahead, and you will still get some benefit from this, if your five phantoms are reading their *portion* of the file in order, which they should. -Original Message- From: George Gallen ggal...@wyanokegroup.com To: U2 Users List u2-users@listserver.u2ug.org Sent: Tue, Oct 2, 2012 10:39 am Subject: Re: [U2] [u2] Parallel processing in Universe If 5 phantoms were running, and read in order but from 5 different starting points, the records would Essentially still be processed in a random order, if you were to layout the record ID's as they get Processed. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Tuesday, October 02, 2012 1:35 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] [u2] Parallel processing in Universe The idea of the phantoms would be to read the file in order, not randomly, just inorder from five different starting points. So you should still get the benefit of some caching. ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
Depends on what you call a no brainer -- to me, $4K for an 800Mb Intel 910 SSD seems reasonable for what you get (10x full drive writes every day for 5 years has the endurance angle covered IMHO - 400Gb is $2K if your database will fit) and by todays standards represents reasonable value. Not quite at the performance level of Fusion IO, but cheap enough to just about be affordable Ross Ferris Stamina Software Visage Better by Design! -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Wednesday, 3 October 2012 3:19 AM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe Great point!! I think we can agree that 'spinning media latency' is the enemy and having phantoms increasing the 'head dance' can make things worse, not better! Many problems go away or become trivial as the spinning media trails to the sunset. I've advised customers that just moving 'code files' to a tiny SSD would likely increase overall system performance on Windows boxes. Just waiting until the price for Enterprise SSDs makes them a no-brainer... Until then, even small SSDs will help! -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Daniel McGrath Sent: Tuesday, October 02, 2012 12:05 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe You've highlighted one problem here. By having multiple processes accessing the disk in different locations, you destroy cache optimization and seek times. More phantoms = less performance. This assumes I/O is a bigger concern than CPU, which is generally the case. More phantoms = more communication, which also adds another overhead that reduces performance. Introducing more phantoms than CPU cores, you increase the amount of context switching, which ones again hurts your cache usage as well as adding bigger overheads on the CPU again. In short, except for very specific cases, increasing 'concurrency' through phantoms on a single machine is generally ill-advised, resulting in longer processing times, higher average system loads and worse yet, greater system complexity (and hence ways for things to break). As mentioned earlier, more system-level architectural changes (such as multiple machines, or at least, files storage on different disks/spindles for each process) are required if you want benefit from this sort of work. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Monday, October 01, 2012 4:47 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe OK - I was trying to create a 'smoother use' of the disk and 'read ahead' -- this example the disk would be chattering from the heads moving all over the place. I was trying to find a way to make this process more 'orderly' -- is there one? -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Robert Houben Sent: Monday, October 01, 2012 4:48 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe Create an index on a dict pointing at the first character of the key, and have each phantom take two digits. (0-1, 2-3, 4-5, 6-7, 8-9) -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: October-01-12 2:43 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe So how would a user 'chop up' a file for parallel processing? Ideally, if here was a Mod 10001 file (or whatever) it would seem like it would be 'ideal' to assign 2000 groups to 5 phantoms -- but I don't know how 'start a BASIC select at Group 2001 or 4001' ... -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 3:29 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe 0001: OPENSEQ /tmp/pipetest TO F.PIPE ELSE STOP NO PIPE 0002: LOOP 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE 0004:PRINT LINE 0005: REPEAT 0006: STOP 0007: END Although, not sure if you might need to sleep a litte between the READSEQ's ELSE and CONTINUE Might suck up cpu time when nothing is writing to the file. Then you could setup a printer in UV that did a cat - /tmp/pipetest Now your phantom just needs to print to that printer. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 4:16 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe The only thing about a pipe is that once it's closed, I believe it has
Re: [U2] [u2] Parallel processing in Universe
On Tue, Oct 2, 2012 at 5:58 PM, Symeon Breen syme...@gmail.com wrote: However map reduce and hadoop are pretty horrible things. Even Google have moved away from it with Caffiene etc. Going OT a little, i think Google is replacing BigTable which was part of Caffeine in 2010 with Spanner now. Here is a doc about it released last month: http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/spanner-osdi2012.pdf ...amusingly they call it a Multi-Version Database, can't wait till that gets abbreviated. - Robert ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
[U2] [u2] Parallel processing in Universe
What's the largest dataset in the Universe user world? In terms of number of records. I'm wondering if we have any potential for utilities that map-reduce. I suppose you would spawn phantoms but how do they communicate back to the master node? ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
pipes -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Monday, October 01, 2012 4:05 PM To: u2-users@listserver.u2ug.org Subject: [U2] [u2] Parallel processing in Universe What's the largest dataset in the Universe user world? In terms of number of records. I'm wondering if we have any potential for utilities that map-reduce. I suppose you would spawn phantoms but how do they communicate back to the master node? ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
The only thing about a pipe is that once it's closed, I believe it has to be re-opened by both Ends again. So if point a opens one end, and point b opens the other end, once either end closes, It closes for both sides, and both sides would have to reopen again to use. To eliminate this, you could have one end open a file, and have the other sides do a append To that file - just make sure you include some kind of dataheader so the reading side knows which Process just wrote the data. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of u2ug Sent: Monday, October 01, 2012 4:11 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe pipes -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Monday, October 01, 2012 4:05 PM To: u2-users@listserver.u2ug.org Subject: [U2] [u2] Parallel processing in Universe What's the largest dataset in the Universe user world? In terms of number of records. I'm wondering if we have any potential for utilities that map-reduce. I suppose you would spawn phantoms but how do they communicate back to the master node? ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
0001: OPENSEQ /tmp/pipetest TO F.PIPE ELSE STOP NO PIPE 0002: LOOP 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE 0004:PRINT LINE 0005: REPEAT 0006: STOP 0007: END Although, not sure if you might need to sleep a litte between the READSEQ's ELSE and CONTINUE Might suck up cpu time when nothing is writing to the file. Then you could setup a printer in UV that did a cat - /tmp/pipetest Now your phantom just needs to print to that printer. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 4:16 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe The only thing about a pipe is that once it's closed, I believe it has to be re-opened by both Ends again. So if point a opens one end, and point b opens the other end, once either end closes, It closes for both sides, and both sides would have to reopen again to use. To eliminate this, you could have one end open a file, and have the other sides do a append To that file - just make sure you include some kind of dataheader so the reading side knows which Process just wrote the data. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of u2ug Sent: Monday, October 01, 2012 4:11 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe pipes -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Monday, October 01, 2012 4:05 PM To: u2-users@listserver.u2ug.org Subject: [U2] [u2] Parallel processing in Universe What's the largest dataset in the Universe user world? In terms of number of records. I'm wondering if we have any potential for utilities that map-reduce. I suppose you would spawn phantoms but how do they communicate back to the master node? ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
So how would a user 'chop up' a file for parallel processing? Ideally, if here was a Mod 10001 file (or whatever) it would seem like it would be 'ideal' to assign 2000 groups to 5 phantoms -- but I don't know how 'start a BASIC select at Group 2001 or 4001' ... -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 3:29 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe 0001: OPENSEQ /tmp/pipetest TO F.PIPE ELSE STOP NO PIPE 0002: LOOP 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE 0004:PRINT LINE 0005: REPEAT 0006: STOP 0007: END Although, not sure if you might need to sleep a litte between the READSEQ's ELSE and CONTINUE Might suck up cpu time when nothing is writing to the file. Then you could setup a printer in UV that did a cat - /tmp/pipetest Now your phantom just needs to print to that printer. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 4:16 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe The only thing about a pipe is that once it's closed, I believe it has to be re-opened by both Ends again. So if point a opens one end, and point b opens the other end, once either end closes, It closes for both sides, and both sides would have to reopen again to use. To eliminate this, you could have one end open a file, and have the other sides do a append To that file - just make sure you include some kind of dataheader so the reading side knows which Process just wrote the data. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of u2ug Sent: Monday, October 01, 2012 4:11 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe pipes -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Monday, October 01, 2012 4:05 PM To: u2-users@listserver.u2ug.org Subject: [U2] [u2] Parallel processing in Universe What's the largest dataset in the Universe user world? In terms of number of records. I'm wondering if we have any potential for utilities that map-reduce. I suppose you would spawn phantoms but how do they communicate back to the master node? ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
Create an index on a dict pointing at the first character of the key, and have each phantom take two digits. (0-1, 2-3, 4-5, 6-7, 8-9) -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: October-01-12 2:43 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe So how would a user 'chop up' a file for parallel processing? Ideally, if here was a Mod 10001 file (or whatever) it would seem like it would be 'ideal' to assign 2000 groups to 5 phantoms -- but I don't know how 'start a BASIC select at Group 2001 or 4001' ... -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 3:29 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe 0001: OPENSEQ /tmp/pipetest TO F.PIPE ELSE STOP NO PIPE 0002: LOOP 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE 0004:PRINT LINE 0005: REPEAT 0006: STOP 0007: END Although, not sure if you might need to sleep a litte between the READSEQ's ELSE and CONTINUE Might suck up cpu time when nothing is writing to the file. Then you could setup a printer in UV that did a cat - /tmp/pipetest Now your phantom just needs to print to that printer. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 4:16 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe The only thing about a pipe is that once it's closed, I believe it has to be re-opened by both Ends again. So if point a opens one end, and point b opens the other end, once either end closes, It closes for both sides, and both sides would have to reopen again to use. To eliminate this, you could have one end open a file, and have the other sides do a append To that file - just make sure you include some kind of dataheader so the reading side knows which Process just wrote the data. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of u2ug Sent: Monday, October 01, 2012 4:11 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe pipes -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Monday, October 01, 2012 4:05 PM To: u2-users@listserver.u2ug.org Subject: [U2] [u2] Parallel processing in Universe What's the largest dataset in the Universe user world? In terms of number of records. I'm wondering if we have any potential for utilities that map-reduce. I suppose you would spawn phantoms but how do they communicate back to the master node? ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
OK - I was trying to create a 'smoother use' of the disk and 'read ahead' -- this example the disk would be chattering from the heads moving all over the place. I was trying to find a way to make this process more 'orderly' -- is there one? -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Robert Houben Sent: Monday, October 01, 2012 4:48 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe Create an index on a dict pointing at the first character of the key, and have each phantom take two digits. (0-1, 2-3, 4-5, 6-7, 8-9) -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: October-01-12 2:43 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe So how would a user 'chop up' a file for parallel processing? Ideally, if here was a Mod 10001 file (or whatever) it would seem like it would be 'ideal' to assign 2000 groups to 5 phantoms -- but I don't know how 'start a BASIC select at Group 2001 or 4001' ... -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 3:29 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe 0001: OPENSEQ /tmp/pipetest TO F.PIPE ELSE STOP NO PIPE 0002: LOOP 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE 0004:PRINT LINE 0005: REPEAT 0006: STOP 0007: END Although, not sure if you might need to sleep a litte between the READSEQ's ELSE and CONTINUE Might suck up cpu time when nothing is writing to the file. Then you could setup a printer in UV that did a cat - /tmp/pipetest Now your phantom just needs to print to that printer. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 4:16 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe The only thing about a pipe is that once it's closed, I believe it has to be re-opened by both Ends again. So if point a opens one end, and point b opens the other end, once either end closes, It closes for both sides, and both sides would have to reopen again to use. To eliminate this, you could have one end open a file, and have the other sides do a append To that file - just make sure you include some kind of dataheader so the reading side knows which Process just wrote the data. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of u2ug Sent: Monday, October 01, 2012 4:11 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe pipes -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Monday, October 01, 2012 4:05 PM To: u2-users@listserver.u2ug.org Subject: [U2] [u2] Parallel processing in Universe What's the largest dataset in the Universe user world? In terms of number of records. I'm wondering if we have any potential for utilities that map-reduce. I suppose you would spawn phantoms but how do they communicate back to the master node? ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
The GROUP.STAT.DETAIL command will tell you the keys, in stored order, in each group of a hashed file. -Original Message- From: David Wolverton dwolv...@flash.net To: 'U2 Users List' u2-users@listserver.u2ug.org Sent: Mon, Oct 1, 2012 3:47 pm Subject: Re: [U2] [u2] Parallel processing in Universe OK - I was trying to create a 'smoother use' of the disk and 'read ahead' -- this example the disk would be chattering from the heads moving all over the place. I was trying to find a way to make this process more 'orderly' -- is there one? -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Robert Houben Sent: Monday, October 01, 2012 4:48 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe Create an index on a dict pointing at the first character of the key, and have each phantom take two digits. (0-1, 2-3, 4-5, 6-7, 8-9) -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: October-01-12 2:43 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe So how would a user 'chop up' a file for parallel processing? Ideally, if here was a Mod 10001 file (or whatever) it would seem like it would be 'ideal' to assign 2000 groups to 5 phantoms -- but I don't know how 'start a BASIC select at Group 2001 or 4001' ... -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 3:29 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe 0001: OPENSEQ /tmp/pipetest TO F.PIPE ELSE STOP NO PIPE 0002: LOOP 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE 0004:PRINT LINE 0005: REPEAT 0006: STOP 0007: END Although, not sure if you might need to sleep a litte between the READSEQ's ELSE and CONTINUE Might suck up cpu time when nothing is writing to the file. Then you could setup a printer in UV that did a cat - /tmp/pipetest Now your phantom just needs to print to that printer. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 4:16 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe The only thing about a pipe is that once it's closed, I believe it has to be re-opened by both Ends again. So if point a opens one end, and point b opens the other end, once either end closes, It closes for both sides, and both sides would have to reopen again to use. To eliminate this, you could have one end open a file, and have the other sides do a append To that file - just make sure you include some kind of dataheader so the reading side knows which Process just wrote the data. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of u2ug Sent: Monday, October 01, 2012 4:11 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe pipes -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Monday, October 01, 2012 4:05 PM To: u2-users@listserver.u2ug.org Subject: [U2] [u2] Parallel processing in Universe What's the largest dataset in the Universe user world? In terms of number of records. I'm wondering if we have any potential for utilities that map-reduce. I suppose you would spawn phantoms but how do they communicate back to the master node? ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
True - but why would you want it any other way ? Once one end closes it - process is complete -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 4:16 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe The only thing about a pipe is that once it's closed, I believe it has to be re-opened by both Ends again. So if point a opens one end, and point b opens the other end, once either end closes, It closes for both sides, and both sides would have to reopen again to use. To eliminate this, you could have one end open a file, and have the other sides do a append To that file - just make sure you include some kind of dataheader so the reading side knows which Process just wrote the data. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of u2ug Sent: Monday, October 01, 2012 4:11 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe pipes -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Monday, October 01, 2012 4:05 PM To: u2-users@listserver.u2ug.org Subject: [U2] [u2] Parallel processing in Universe What's the largest dataset in the Universe user world? In terms of number of records. I'm wondering if we have any potential for utilities that map-reduce. I suppose you would spawn phantoms but how do they communicate back to the master node? ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
Or, let's suppose you wanted to process repetitive segments of one very large record using the same logic in a separate phantom process for each segment, how large a record can be read and processed in Universe? Dave So how would a user 'chop up' a file for parallel processing? Ideally, if here was a Mod 10001 file (or whatever) it would seem like it would be 'ideal' to assign 2000 groups to 5 phantoms -- but I don't know how 'start a BASIC select at Group 2001 or 4001' ... -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 3:29 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe 0001: OPENSEQ /tmp/pipetest TO F.PIPE ELSE STOP NO PIPE 0002: LOOP 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE 0004:PRINT LINE 0005: REPEAT 0006: STOP 0007: END Although, not sure if you might need to sleep a litte between the READSEQ's ELSE and CONTINUE Might suck up cpu time when nothing is writing to the file. Then you could setup a printer in UV that did a cat - /tmp/pipetest Now your phantom just needs to print to that printer. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 4:16 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe The only thing about a pipe is that once it's closed, I believe it has to be re-opened by both Ends again. So if point a opens one end, and point b opens the other end, once either end closes, It closes for both sides, and both sides would have to reopen again to use. To eliminate this, you could have one end open a file, and have the other sides do a append To that file - just make sure you include some kind of dataheader so the reading side knows which Process just wrote the data. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of u2ug Sent: Monday, October 01, 2012 4:11 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe pipes -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Monday, October 01, 2012 4:05 PM To: u2-users@listserver.u2ug.org Subject: [U2] [u2] Parallel processing in Universe What's the largest dataset in the Universe user world? In terms of number of records. I'm wondering if we have any potential for utilities that map-reduce. I suppose you would spawn phantoms but how do they communicate back to the master node? ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
If the file were big enough, and already had part files, then I believe that you could have a phantom process each of the individual parts. Failing that, get an SSD relatively cheap, and will give your processing a reasonable kick along!! Ross Ferris Stamina Software Visage Better by Design! -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Tuesday, 2 October 2012 8:47 AM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe OK - I was trying to create a 'smoother use' of the disk and 'read ahead' -- this example the disk would be chattering from the heads moving all over the place. I was trying to find a way to make this process more 'orderly' -- is there one? -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Robert Houben Sent: Monday, October 01, 2012 4:48 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe Create an index on a dict pointing at the first character of the key, and have each phantom take two digits. (0-1, 2-3, 4-5, 6-7, 8-9) -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: October-01-12 2:43 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe So how would a user 'chop up' a file for parallel processing? Ideally, if here was a Mod 10001 file (or whatever) it would seem like it would be 'ideal' to assign 2000 groups to 5 phantoms -- but I don't know how 'start a BASIC select at Group 2001 or 4001' ... -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 3:29 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe 0001: OPENSEQ /tmp/pipetest TO F.PIPE ELSE STOP NO PIPE 0002: LOOP 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE 0004:PRINT LINE 0005: REPEAT 0006: STOP 0007: END Although, not sure if you might need to sleep a litte between the READSEQ's ELSE and CONTINUE Might suck up cpu time when nothing is writing to the file. Then you could setup a printer in UV that did a cat - /tmp/pipetest Now your phantom just needs to print to that printer. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 4:16 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe The only thing about a pipe is that once it's closed, I believe it has to be re-opened by both Ends again. So if point a opens one end, and point b opens the other end, once either end closes, It closes for both sides, and both sides would have to reopen again to use. To eliminate this, you could have one end open a file, and have the other sides do a append To that file - just make sure you include some kind of dataheader so the reading side knows which Process just wrote the data. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of u2ug Sent: Monday, October 01, 2012 4:11 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe pipes -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Monday, October 01, 2012 4:05 PM To: u2-users@listserver.u2ug.org Subject: [U2] [u2] Parallel processing in Universe What's the largest dataset in the Universe user world? In terms of number of records. I'm wondering if we have any potential for utilities that map-reduce. I suppose you would spawn phantoms but how do they communicate back to the master node? ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org
Re: [U2] [u2] Parallel processing in Universe (Unclassified)
I have often thought about this - mostly in an idle moment or as a displacement activity for something less amusing that I ought to be doing. ;-) First of all, Universe is already extremely parallel: there's a separate O/S thread for each TTY and for each phantom, and you can't get more parallel than that for interactive processing. So you want more parallelism for your batch processes. Different applications have different degrees of inherent parallelism. For example in utility billing systems there is frequently the concept of a group of premises - based on the old concept of a foot-borne meter reader with a 'book' of readings to get. Each 'book' can be processed independently of every other. In payroll, each employee's record can be processed independently. Other areas of commerce have different characteristics. I think that whatever unit of parallelism you settle for, you'd need three processes: a 'dispatcher' that selects records for processing and queues them into some structure for processing; a set of 'workers' that take queued work items, process them, mark them as processed and put the results in some common store; and a 'monitor' that looks for unprocessed records and indications of stuck processes, and collates the results for final output. I've seen a couple of versions of this, one for electricity billings and another for overnight batch-processing of report requests, both well over a decade ago, and neither still in use although their underlying packages are still being run. The major issue is that these days the whole entity in the general commercial world is far more likely to be I/O limited than CPU limited, and therefore introducing parallelism will be no help at all if the I/O system is already choked. Even if the system is currently CPU-limited, multi-threading may not produce much improvement without very careful design of the record locking philosophy - introducing parallelism will be no help if all the threads end up contending serially for one record lock or a small set of locks. If you want it to go faster, buy the CPU with the fastest clock you can get (not the one with the most cores), and put your database on SSD like Ross said. The Power7+ chips being announced any day now are rumoured to go to 5GHz+, maybe even more if you have half the cores on the chip disabled. Regards Mike -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Ross Ferris Sent: Tuesday, 2 October 2012 3:50 p.m. To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe If the file were big enough, and already had part files, then I believe that you could have a phantom process each of the individual parts. Failing that, get an SSD relatively cheap, and will give your processing a reasonable kick along!! Ross Ferris Stamina Software Visage Better by Design! -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Tuesday, 2 October 2012 8:47 AM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe OK - I was trying to create a 'smoother use' of the disk and 'read ahead' -- this example the disk would be chattering from the heads moving all over the place. I was trying to find a way to make this process more 'orderly' -- is there one? -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Robert Houben Sent: Monday, October 01, 2012 4:48 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe Create an index on a dict pointing at the first character of the key, and have each phantom take two digits. (0-1, 2-3, 4-5, 6-7, 8-9) -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: October-01-12 2:43 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe So how would a user 'chop up' a file for parallel processing? Ideally, if here was a Mod 10001 file (or whatever) it would seem like it would be 'ideal' to assign 2000 groups to 5 phantoms -- but I don't know how 'start a BASIC select at Group 2001 or 4001' ... -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 3:29 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe 0001: OPENSEQ /tmp/pipetest TO F.PIPE ELSE STOP NO PIPE 0002: LOOP 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE 0004:PRINT LINE 0005: REPEAT 0006: STOP 0007: END Although, not sure if you might need to sleep a litte between the READSEQ's ELSE and CONTINUE Might suck up cpu time when nothing is writing to the file. Then you could setup a printer in UV that did a cat - /tmp/pipetest Now your phantom
Re: [U2] [u2] Parallel processing in Universe (Unclassified)
Interestingly I'm currently trying to find a definitive answer/correlation between clock speed performance on a single core/thread on Intel CPU's to confirm, or deny, that for grunt batch work if a 4C Intel running @ 3.4Ghz will actually be faster than an 8C running @ 2.7Ghz -- the answer isn't as straight forward (or as easy to find) as I would have hoped, as even within the same family (e5-2600) there can be architectural differences that come into play and if anyone has a definitive answer, please feel free to share ! Ross Ferris Stamina Software Visage Better by Design! -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of HENDERSON MIKE, MR Sent: Tuesday, 2 October 2012 1:18 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe (Unclassified) I have often thought about this - mostly in an idle moment or as a displacement activity for something less amusing that I ought to be doing. ;-) First of all, Universe is already extremely parallel: there's a separate O/S thread for each TTY and for each phantom, and you can't get more parallel than that for interactive processing. So you want more parallelism for your batch processes. Different applications have different degrees of inherent parallelism. For example in utility billing systems there is frequently the concept of a group of premises - based on the old concept of a foot-borne meter reader with a 'book' of readings to get. Each 'book' can be processed independently of every other. In payroll, each employee's record can be processed independently. Other areas of commerce have different characteristics. I think that whatever unit of parallelism you settle for, you'd need three processes: a 'dispatcher' that selects records for processing and queues them into some structure for processing; a set of 'workers' that take queued work items, process them, mark them as processed and put the results in some common store; and a 'monitor' that looks for unprocessed records and indications of stuck processes, and collates the results for final output. I've seen a couple of versions of this, one for electricity billings and another for overnight batch-processing of report requests, both well over a decade ago, and neither still in use although their underlying packages are still being run. The major issue is that these days the whole entity in the general commercial world is far more likely to be I/O limited than CPU limited, and therefore introducing parallelism will be no help at all if the I/O system is already choked. Even if the system is currently CPU-limited, multi-threading may not produce much improvement without very careful design of the record locking philosophy - introducing parallelism will be no help if all the threads end up contending serially for one record lock or a small set of locks. If you want it to go faster, buy the CPU with the fastest clock you can get (not the one with the most cores), and put your database on SSD like Ross said. The Power7+ chips being announced any day now are rumoured to go to 5GHz+, maybe even more if you have half the cores on the chip disabled. Regards Mike -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Ross Ferris Sent: Tuesday, 2 October 2012 3:50 p.m. To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe If the file were big enough, and already had part files, then I believe that you could have a phantom process each of the individual parts. Failing that, get an SSD relatively cheap, and will give your processing a reasonable kick along!! Ross Ferris Stamina Software Visage Better by Design! -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Tuesday, 2 October 2012 8:47 AM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe OK - I was trying to create a 'smoother use' of the disk and 'read ahead' -- this example the disk would be chattering from the heads moving all over the place. I was trying to find a way to make this process more 'orderly' -- is there one? -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Robert Houben Sent: Monday, October 01, 2012 4:48 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe Create an index on a dict pointing at the first character of the key, and have each phantom take two digits. (0-1, 2-3, 4-5, 6-7, 8-9) -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: October-01-12 2:43 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe So how would a user 'chop up' a file