Re: [U2] STARTUP file issue with UV11.1 PE version (Linux)
The installation instructions of Rocket is quite good and does indeed mention the need to use "cpio" on UNIX servers. See "Quick Installation" and "Step-by-step Instructions" (of NEWINSTALL.PDF)... However, the instructions from Rocket could be improved - with a minor revision, as the Installation guide assumes you're using a CD-ROM or tape drive to get the installation software on your system. It is not clear that after you download the software archive file from the Internet, that you can upload it (as a binary file) to your UNIX host. Once there, "un-zip" (preferably as root) and then you must perform the "cpio" after that step. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of doug chanco Sent: Tuesday, 2 October 2012 3:43 AM To: 'U2 Users List' Subject: Re: [U2] STARTUP file issue with UV11.1 PE version (Linux) No sir, I did not know that, why would they cpio it anyway? Not that it matters I was just curious, anyway thanks for the info. Dougc -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Brian Leach Sent: Monday, October 01, 2012 1:19 PM To: 'U2 Users List' Subject: Re: [U2] STARTUP file issue with UV11.1 PE version (Linux) Doug Have you remembered that STARTUP is a cpio archive? # cpio -uvcdumB uv.load < STARTUP ./uv.load -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of doug chanco Sent: 01 October 2012 17:37 To: U2 Users List Subject: [U2] STARTUP file issue with UV11.1 PE version (Linux) I recently downloaded uv 11 and when I went to run STARTUP I got a weird error, upon looking at the STARTUP script I noticed it had a bunch of binary and other junk at the beginning of the file, I removed all the "extra" stuff, saved the file and it ran just fine. Has anyone else seen this? I re downloaded the zip and still had this issue. It was easy enough to resolve but I thought I would mention it. Dougc ** IMPORTANT MESSAGE * This e-mail message is intended only for the addressee(s) and contains information which may be confidential. If you are not the intended recipient please advise the sender by return email, do not use or disclose the contents, and delete the message and any attachments from your system. Unless specifically indicated, this email does not constitute formal advice or commitment by the sender or the Commonwealth Bank of Australia (ABN 48 123 123 124) or its subsidiaries. We can be contacted through our web site: commbank.com.au. If you no longer wish to receive commercial electronic messages from us, please reply to this e-mail by typing Unsubscribe in the subject line. ** ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
Oracle and sql server both use map reduce internally when doing collations and totals. However they work differently to U2 in that they have one big process that runs queries from the clients. This process can then cache, multithread and map reduce. U2 is differently architected in that the client process (uv or udt process) actually do the work and the central udt processes are fairly slim. These client processes are single threaded. To do any multi threading/multi processing is part of the application rather than inherent in the database. One option is to make u2 a hadoop supported data store, you could then mapreduce across multiple instances using whatever hadoop supporting toolset you wanted. However map reduce and hadoop are pretty horrible things. Even Google have moved away from it with Caffiene etc. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: 01 October 2012 21:05 To: u2-users@listserver.u2ug.org Subject: [U2] [u2] Parallel processing in Universe What's the largest dataset in the Universe user world? In terms of number of records. I'm wondering if we have any potential for utilities that map-reduce. I suppose you would spawn phantoms but how do they communicate back to the master node? ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] STARTUP file issue with UV11.1 PE version (Linux)
On 02/10/12 08:23, Hona, David wrote: > The installation instructions of Rocket is quite good and does indeed mention > the need to use "cpio" on UNIX servers. See "Quick Installation" and > "Step-by-step Instructions" (of NEWINSTALL.PDF)... > > However, the instructions from Rocket could be improved - with a minor > revision, as the Installation guide assumes you're using a CD-ROM or tape > drive to get the installation software on your system. > > It is not clear that after you download the software archive file from the > Internet, that you can upload it (as a binary file) to your UNIX host. Once > there, "un-zip" (preferably as root) and then you must perform the "cpio" > after that step. Does it also mention that the default cpio options have changed? Okay, if you're installing on a non-supported linux why would Rocket worry, but if you do get problems I think it's the -B option. Something to do with block size, anyway. The default meaning has reversed so the script isn't portable across different linuxen. All you have to do, though, is add or remove the changed option. Cheers, Wol ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
On 01/10/12 22:47, Robert Houben wrote: > Create an index on a dict pointing at the first character of the key, and > have each phantom take two digits. (0-1, 2-3, 4-5, 6-7, 8-9) > Actually, this is a very BAD way of chopping up a file into five even chunks. I'm not sure of the stats, but on any file with sequential keys, the first phantom will get the majority of the records, the second get the majority of what's left, etc etc. A lot of people make the mistake of thinking this is a good technique. I'm not even sure it works well with random numbers... Cheers, Wol ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
On 02/10/12 03:49, Ross Ferris wrote: > If the file were big enough, and already had part files, then I believe that > you could have a phantom process each of the individual parts. Failing that, > get an SSD relatively cheap, and will give your processing a reasonable > kick along!! > Just be careful with an SSD. If you have a power-fail in the middle of your process this sounds just like the scenario that will trash it. As in, totally dead no recovery possible. SSDs are great, but a power fail during write can take out the controller. One dead, irrecoverable disk. And if you're hammering the i/o you are VERY vulnerable. Cheers, Wol ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
What about an striped array of SSD with a backup battery to flush the write buffer on power fail. No more dangerous (IMO) than an array of hard drives - but given the limited write times of an SSD That could be more of a danger, unless your using larger drives and not a lot of data so the drive Has lot's of area to failover to when it reaches it's write maximum. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wols Lists Sent: Tuesday, October 02, 2012 4:20 AM To: u2-users@listserver.u2ug.org Subject: Re: [U2] [u2] Parallel processing in Universe On 02/10/12 03:49, Ross Ferris wrote: > If the file were big enough, and already had part files, then I believe that > you could have a phantom process each of the individual parts. Failing that, > get an SSD relatively cheap, and will give your processing a reasonable > kick along!! > Just be careful with an SSD. If you have a power-fail in the middle of your process this sounds just like the scenario that will trash it. As in, totally dead no recovery possible. SSDs are great, but a power fail during write can take out the controller. One dead, irrecoverable disk. And if you're hammering the i/o you are VERY vulnerable. Cheers, Wol ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe (Unclassified)
Only outside of U2 using UniObjects can you achieve any type of parallel activity. We have through UniObjects got 80 processes working from a single Eclipse session through the use of threads in Java. UniObjects creates individual uvapi_slave or udapi_slave for each of these processes but the system or in this case the udapi_server or uvadpi_server cannot handle as many threads as we would like. We never ran out of memory on our 8GB Windows 2008R2 Server nor did SSD 120GB drive fail to keep up with the 80 ANALYZE.FILES or the 80 RESIZE commands we were issuing on from our XLr8Resizer product within Eclipse. The only way we got this working was to set the retries to 1000 on reopening the connections. Although that number seems high it helped and get us from our previous best of 39 process to 80 process. When we have a lot of time and cannot think of anything better to do we will try for 500 process. Regards, Doug www.u2logic.com "Eclipse based tools for the U2 programmer" > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org [mailto: > u2-users-boun...@listserver.u2ug.org] On Behalf Of HENDERSON MIKE, MR > Sent: Tuesday, 2 October 2012 1:18 PM > To: U2 Users List > Subject: Re: [U2] [u2] Parallel processing in Universe (Unclassified) > > I have often thought about this - mostly in an idle moment or as a > displacement activity for something less amusing that I ought to be doing. > ;-) > > > First of all, Universe is already extremely parallel: there's a separate > O/S thread for each TTY and for each phantom, and you can't get more > parallel than that for interactive processing. > > So you want more parallelism for your batch processes. > Different applications have different degrees of inherent parallelism. > For example in utility billing systems there is frequently the concept of > a group of premises - based on the old concept of a foot-borne meter reader > with a 'book' of readings to get. Each 'book' can be processed > independently of every other. In payroll, each employee's record can be > processed independently. Other areas of commerce have different > characteristics. > > I think that whatever unit of parallelism you settle for, you'd need three > processes: a 'dispatcher' that selects records for processing and queues > them into some structure for processing; a set of 'workers' that take > queued work items, process them, mark them as processed and put the results > in some common store; and a 'monitor' that looks for unprocessed records > and indications of stuck processes, and collates the results for final > output. > I've seen a couple of versions of this, one for electricity billings and > another for overnight batch-processing of report requests, both well over a > decade ago, and neither still in use although their underlying packages are > still being run. > > The major issue is that these days the whole entity in the general > commercial world is far more likely to be I/O limited than CPU limited, and > therefore introducing parallelism will be no help at all if the I/O system > is already choked. > Even if the system is currently CPU-limited, multi-threading may not > produce much improvement without very careful design of the record locking > philosophy - introducing parallelism will be no help if all the threads end > up contending serially for one record lock or a small set of locks. > > > If you want it to go faster, buy the CPU with the fastest clock you can > get (not the one with the most cores), and put your database on SSD like > Ross said. > The Power7+ chips being announced any day now are rumoured to go to > 5GHz+, maybe even more if you have half the cores on the chip disabled. > > > Regards > > > Mike > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Ross Ferris > Sent: Tuesday, 2 October 2012 3:50 p.m. > To: U2 Users List > Subject: Re: [U2] [u2] Parallel processing in Universe > > If the file were big enough, and already had part files, then I believe > that you could have a phantom process each of the individual parts. > Failing that, get an SSD relatively cheap, and will give your > processing a reasonable kick along!! > > Ross Ferris > Stamina Software > Visage > Better by Design! > > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton > Sent: Tuesday, 2 October 2012 8:47 AM > To: 'U2 Users List' > Subject: Re: [U2] [u2] Parallel processing in Universe > > OK - I was trying to create a 'smoother use' of the disk and 'read ahead' > -- this example the disk would be chattering from the heads moving all over > the place. I was trying to find a way to make this process more 'orderly' > -- is there one? > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Robert Houben >
Re: [U2] [u2] Parallel processing in Universe
On 02/10/12 15:28, George Gallen wrote: > What about an striped array of SSD with a backup battery to flush the write > buffer on power fail. > No more dangerous (IMO) than an array of hard drives - but given the limited > write times of an SSD > That could be more of a danger, unless your using larger drives and not a lot > of data so the drive > Has lot's of area to failover to when it reaches it's write maximum. I guess a backup battery would save you. Basically, anything to prevent power dying in the middle of a write. But the striped array would probably simply mean several trashed drives instead of one. It's a known, guaranteed, "this is what will kill a drive" scenario, and an array would just mean more drives at risk. The place I came across a major discussion about this (I knew of the issue earlier) said that some combo of Windows, update, and a certain laptop was notorious for writing off drives. The update would flood the cache, then the laptop would suspend. Cue one dead drive and, if within warranty, one no-quibble replacement. Cheers, Wol > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wols Lists > Sent: Tuesday, October 02, 2012 4:20 AM > To: u2-users@listserver.u2ug.org > Subject: Re: [U2] [u2] Parallel processing in Universe > > On 02/10/12 03:49, Ross Ferris wrote: >> If the file were big enough, and already had part files, then I believe that >> you could have a phantom process each of the individual parts. Failing that, >> get an SSD relatively cheap, and will give your processing a reasonable >> kick along!! >> > Just be careful with an SSD. If you have a power-fail in the middle of > your process this sounds just like the scenario that will trash it. As > in, totally dead no recovery possible. > > SSDs are great, but a power fail during write can take out the > controller. One dead, irrecoverable disk. And if you're hammering the > i/o you are VERY vulnerable. > > Cheers, > Wol > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users > ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe (Unclassified)
Only outside of U2 using UniObjects can you achieve any type of parallel activity. We have through UniObjects got 80 processes working from a single Eclipse session through the use of threads in Java. UniObjects creates individual uvapi_slave or udapi_slave for each of these processes but the system or in this case the udapi_server or uvadpi_server cannot handle as many threads as we would like. We never ran out of memory on our 8GB Windows 2008R2 Server nor did SSD 120GB drive fail to keep up with the 80 ANALYZE.FILES or the 80 RESIZE commands we were issuing on from our XLr8Resizer product within Eclipse. The only way we got this working was to set the retries to 1000 on reopening the connections. Although that number seems high it helped and get us from our previous best of 39 process to 80 process. When we have a lot of time and cannot think of anything better to do we will try for 500 process. Regards, Doug www.u2logic.com/tools.html "Eclipse based tools for the U2 programmer" ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
Yes the low numbers are used more often. However if you have sequential keys, just use the *last* two digits instead of the first two -Original Message- From: Wols Lists To: u2-users Sent: Tue, Oct 2, 2012 1:17 am Subject: Re: [U2] [u2] Parallel processing in Universe On 01/10/12 22:47, Robert Houben wrote: > Create an index on a dict pointing at the first character of the key, and > have each phantom take two digits. (0-1, 2-3, 4-5, 6-7, 8-9) > Actually, this is a very BAD way of chopping up a file into five even chunks. I'm not sure of the stats, but on any file with sequential keys, the first phantom will get the majority of the records, the second get the majority of what's left, etc etc. A lot of people make the mistake of thinking this is a good technique. I'm not even sure it works well with random numbers... Cheers, Wol ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
In my example, I would grab 'whatever' records were hashed in the to 'group' -- while it's not perfect since there are 'overflow' - was just trying to think of a way to break a file into pieces that would otherwise process much like a BASIC select - just grab the 'group' and go I can see it's probably not possible, but the topic got me thinking about 'what if'... (And we're UniData - so I have to apply that filter to most everything I read on the list anyway ) -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Taylor Sent: Monday, October 01, 2012 6:10 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe Or, let's suppose you wanted to process repetitive segments of one very large record using the same logic in a separate phantom process for each segment, how large a record can be read and processed in Universe? Dave > So how would a user 'chop up' a file for parallel processing? > Ideally, if here was a Mod 10001 file (or whatever) it would seem like > it would be 'ideal' to assign 2000 groups to 5 phantoms -- but I don't > know how 'start a BASIC select at Group 2001 or 4001' ... > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George > Gallen > Sent: Monday, October 01, 2012 3:29 PM > To: U2 Users List > Subject: Re: [U2] [u2] Parallel processing in Universe > > 0001: OPENSEQ "/tmp/pipetest" TO F.PIPE ELSE STOP "NO PIPE" > 0002: LOOP > 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE > 0004:PRINT LINE > 0005: REPEAT > 0006: STOP > 0007: END > > Although, not sure if you might need to sleep a litte between the > READSEQ's ELSE and CONTINUE >Might suck up cpu time when nothing is writing to the file. > > Then you could setup a printer in UV that did a "cat - >> /tmp/pipetest" > > Now your phantom just needs to print to that printer. > > George > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George > Gallen > Sent: Monday, October 01, 2012 4:16 PM > To: U2 Users List > Subject: Re: [U2] [u2] Parallel processing in Universe > > The only thing about a pipe is that once it's closed, I believe it has > to be re-opened by both Ends again. So if point a opens one end, and > point b opens the other end, once either end closes, It closes for > both sides, and both sides would have to reopen again to use. > > To eliminate this, you could have one end open a file, and have the > other sides do a ">>" append To that file - just make sure you include > some kind of dataheader so the reading side knows which Process just wrote the data. > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of u2ug > Sent: Monday, October 01, 2012 4:11 PM > To: U2 Users List > Subject: Re: [U2] [u2] Parallel processing in Universe > > pipes > > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson > Sent: Monday, October 01, 2012 4:05 PM > To: u2-users@listserver.u2ug.org > Subject: [U2] [u2] Parallel processing in Universe > > > What's the largest dataset in the Universe user world? > In terms of number of records. > > I'm wondering if we have any potential for utilities that map-reduce. > I suppose you would spawn phantoms but how do they communicate back to > the master node? > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users > > > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users > > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users > ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
What if you created a duplicate file, did a SELECT and saved the list (non-sorted). Each of the phantoms would do a getlist and loop through using readlist/readu and if the record were already locked, skip it until it reads An unlocked record (and locks it). Delete the record when finished. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Tuesday, October 02, 2012 11:43 AM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe In my example, I would grab 'whatever' records were hashed in the to 'group' -- while it's not perfect since there are 'overflow' - was just trying to think of a way to break a file into pieces that would otherwise process much like a BASIC select - just grab the 'group' and go I can see it's probably not possible, but the topic got me thinking about 'what if'... (And we're UniData - so I have to apply that filter to most everything I read on the list anyway ) -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Taylor Sent: Monday, October 01, 2012 6:10 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe Or, let's suppose you wanted to process repetitive segments of one very large record using the same logic in a separate phantom process for each segment, how large a record can be read and processed in Universe? Dave > So how would a user 'chop up' a file for parallel processing? > Ideally, if here was a Mod 10001 file (or whatever) it would seem like > it would be 'ideal' to assign 2000 groups to 5 phantoms -- but I don't > know how 'start a BASIC select at Group 2001 or 4001' ... > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George > Gallen > Sent: Monday, October 01, 2012 3:29 PM > To: U2 Users List > Subject: Re: [U2] [u2] Parallel processing in Universe > > 0001: OPENSEQ "/tmp/pipetest" TO F.PIPE ELSE STOP "NO PIPE" > 0002: LOOP > 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE > 0004:PRINT LINE > 0005: REPEAT > 0006: STOP > 0007: END > > Although, not sure if you might need to sleep a litte between the > READSEQ's ELSE and CONTINUE >Might suck up cpu time when nothing is writing to the file. > > Then you could setup a printer in UV that did a "cat - >> /tmp/pipetest" > > Now your phantom just needs to print to that printer. > > George > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George > Gallen > Sent: Monday, October 01, 2012 4:16 PM > To: U2 Users List > Subject: Re: [U2] [u2] Parallel processing in Universe > > The only thing about a pipe is that once it's closed, I believe it has > to be re-opened by both Ends again. So if point a opens one end, and > point b opens the other end, once either end closes, It closes for > both sides, and both sides would have to reopen again to use. > > To eliminate this, you could have one end open a file, and have the > other sides do a ">>" append To that file - just make sure you include > some kind of dataheader so the reading side knows which Process just wrote the data. > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of u2ug > Sent: Monday, October 01, 2012 4:11 PM > To: U2 Users List > Subject: Re: [U2] [u2] Parallel processing in Universe > > pipes > > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson > Sent: Monday, October 01, 2012 4:05 PM > To: u2-users@listserver.u2ug.org > Subject: [U2] [u2] Parallel processing in Universe > > > What's the largest dataset in the Universe user world? > In terms of number of records. > > I'm wondering if we have any potential for utilities that map-reduce. > I suppose you would spawn phantoms but how do they communicate back to > the master node? > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users > > > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users > > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users > ___
Re: [U2] [u2] Parallel processing in Universe
AH - would not even have to 'delete' as long as the 'locks' are held long enough -- meaning if you know you will have 20 phantoms, each phantom would keep a list of 'keys locked' and once it hits 21 (or 40 if you want insurance LOL) in the list, would unlock earliest lock -- that way there is no way any other phantom could process anything twice... As each phantom runs, if it hits a locked record, it would move to the next item in the list. Great idea! DW -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Tuesday, October 02, 2012 10:52 AM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe What if you created a duplicate file, did a SELECT and saved the list (non-sorted). Each of the phantoms would do a getlist and loop through using readlist/readu and if the record were already locked, skip it until it reads An unlocked record (and locks it). Delete the record when finished. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Tuesday, October 02, 2012 11:43 AM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe In my example, I would grab 'whatever' records were hashed in the to 'group' -- while it's not perfect since there are 'overflow' - was just trying to think of a way to break a file into pieces that would otherwise process much like a BASIC select - just grab the 'group' and go I can see it's probably not possible, but the topic got me thinking about 'what if'... (And we're UniData - so I have to apply that filter to most everything I read on the list anyway ) -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Taylor Sent: Monday, October 01, 2012 6:10 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe Or, let's suppose you wanted to process repetitive segments of one very large record using the same logic in a separate phantom process for each segment, how large a record can be read and processed in Universe? Dave > So how would a user 'chop up' a file for parallel processing? > Ideally, if here was a Mod 10001 file (or whatever) it would seem like > it would be 'ideal' to assign 2000 groups to 5 phantoms -- but I don't > know how 'start a BASIC select at Group 2001 or 4001' ... > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George > Gallen > Sent: Monday, October 01, 2012 3:29 PM > To: U2 Users List > Subject: Re: [U2] [u2] Parallel processing in Universe > > 0001: OPENSEQ "/tmp/pipetest" TO F.PIPE ELSE STOP "NO PIPE" > 0002: LOOP > 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE > 0004:PRINT LINE > 0005: REPEAT > 0006: STOP > 0007: END > > Although, not sure if you might need to sleep a litte between the > READSEQ's ELSE and CONTINUE >Might suck up cpu time when nothing is writing to the file. > > Then you could setup a printer in UV that did a "cat - >> /tmp/pipetest" > > Now your phantom just needs to print to that printer. > > George > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George > Gallen > Sent: Monday, October 01, 2012 4:16 PM > To: U2 Users List > Subject: Re: [U2] [u2] Parallel processing in Universe > > The only thing about a pipe is that once it's closed, I believe it has > to be re-opened by both Ends again. So if point a opens one end, and > point b opens the other end, once either end closes, It closes for > both sides, and both sides would have to reopen again to use. > > To eliminate this, you could have one end open a file, and have the > other sides do a ">>" append To that file - just make sure you include > some kind of dataheader so the reading side knows which Process just wrote the data. > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of u2ug > Sent: Monday, October 01, 2012 4:11 PM > To: U2 Users List > Subject: Re: [U2] [u2] Parallel processing in Universe > > pipes > > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson > Sent: Monday, October 01, 2012 4:05 PM > To: u2-users@listserver.u2ug.org > Subject: [U2] [u2] Parallel processing in Universe > > > What's the largest dataset in the Universe user world? > In terms of number of records. > > I'm wondering if we have any potential for utilities that map-reduce. > I suppose you would spawn phantoms but how do they communicate back to > the master node? > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2
Re: [U2] [u2] Parallel processing in Universe
You've highlighted one problem here. By having multiple processes accessing the disk in different locations, you destroy cache optimization and seek times. More phantoms = less performance. This assumes I/O is a bigger concern than CPU, which is generally the case. More phantoms = more communication, which also adds another overhead that reduces performance. Introducing more phantoms than CPU cores, you increase the amount of context switching, which ones again hurts your cache usage as well as adding bigger overheads on the CPU again. In short, except for very specific cases, increasing 'concurrency' through phantoms on a single machine is generally ill-advised, resulting in longer processing times, higher average system loads and worse yet, greater system complexity (and hence ways for things to break). As mentioned earlier, more system-level architectural changes (such as multiple machines, or at least, files storage on different disks/spindles for each process) are required if you want benefit from this sort of work. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Monday, October 01, 2012 4:47 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe OK - I was trying to create a 'smoother use' of the disk and 'read ahead' -- this example the disk would be chattering from the heads moving all over the place. I was trying to find a way to make this process more 'orderly' -- is there one? -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Robert Houben Sent: Monday, October 01, 2012 4:48 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe Create an index on a dict pointing at the first character of the key, and have each phantom take two digits. (0-1, 2-3, 4-5, 6-7, 8-9) -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: October-01-12 2:43 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe So how would a user 'chop up' a file for parallel processing? Ideally, if here was a Mod 10001 file (or whatever) it would seem like it would be 'ideal' to assign 2000 groups to 5 phantoms -- but I don't know how 'start a BASIC select at Group 2001 or 4001' ... -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 3:29 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe 0001: OPENSEQ "/tmp/pipetest" TO F.PIPE ELSE STOP "NO PIPE" 0002: LOOP 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE 0004:PRINT LINE 0005: REPEAT 0006: STOP 0007: END Although, not sure if you might need to sleep a litte between the READSEQ's ELSE and CONTINUE Might suck up cpu time when nothing is writing to the file. Then you could setup a printer in UV that did a "cat - >> /tmp/pipetest" Now your phantom just needs to print to that printer. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 4:16 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe The only thing about a pipe is that once it's closed, I believe it has to be re-opened by both Ends again. So if point a opens one end, and point b opens the other end, once either end closes, It closes for both sides, and both sides would have to reopen again to use. To eliminate this, you could have one end open a file, and have the other sides do a ">>" append To that file - just make sure you include some kind of dataheader so the reading side knows which Process just wrote the data. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of u2ug Sent: Monday, October 01, 2012 4:11 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe pipes -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Monday, October 01, 2012 4:05 PM To: u2-users@listserver.u2ug.org Subject: [U2] [u2] Parallel processing in Universe What's the largest dataset in the Universe user world? In terms of number of records. I'm wondering if we have any potential for utilities that map-reduce. I suppose you would spawn phantoms but how do they communicate back to the master node? ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/ma
Re: [U2] [u2] Parallel processing in Universe
Great point!! I think we can agree that 'spinning media latency' is the enemy and having phantoms increasing the 'head dance' can make things worse, not better! Many problems go away or become trivial as the spinning media trails to the sunset. I've advised customers that just moving 'code files' to a tiny SSD would likely increase overall system performance on Windows boxes. Just waiting until the price for Enterprise SSDs makes them a no-brainer... Until then, even small SSDs will help! -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Daniel McGrath Sent: Tuesday, October 02, 2012 12:05 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe You've highlighted one problem here. By having multiple processes accessing the disk in different locations, you destroy cache optimization and seek times. More phantoms = less performance. This assumes I/O is a bigger concern than CPU, which is generally the case. More phantoms = more communication, which also adds another overhead that reduces performance. Introducing more phantoms than CPU cores, you increase the amount of context switching, which ones again hurts your cache usage as well as adding bigger overheads on the CPU again. In short, except for very specific cases, increasing 'concurrency' through phantoms on a single machine is generally ill-advised, resulting in longer processing times, higher average system loads and worse yet, greater system complexity (and hence ways for things to break). As mentioned earlier, more system-level architectural changes (such as multiple machines, or at least, files storage on different disks/spindles for each process) are required if you want benefit from this sort of work. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Monday, October 01, 2012 4:47 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe OK - I was trying to create a 'smoother use' of the disk and 'read ahead' -- this example the disk would be chattering from the heads moving all over the place. I was trying to find a way to make this process more 'orderly' -- is there one? -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Robert Houben Sent: Monday, October 01, 2012 4:48 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe Create an index on a dict pointing at the first character of the key, and have each phantom take two digits. (0-1, 2-3, 4-5, 6-7, 8-9) -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: October-01-12 2:43 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe So how would a user 'chop up' a file for parallel processing? Ideally, if here was a Mod 10001 file (or whatever) it would seem like it would be 'ideal' to assign 2000 groups to 5 phantoms -- but I don't know how 'start a BASIC select at Group 2001 or 4001' ... -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 3:29 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe 0001: OPENSEQ "/tmp/pipetest" TO F.PIPE ELSE STOP "NO PIPE" 0002: LOOP 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE 0004:PRINT LINE 0005: REPEAT 0006: STOP 0007: END Although, not sure if you might need to sleep a litte between the READSEQ's ELSE and CONTINUE Might suck up cpu time when nothing is writing to the file. Then you could setup a printer in UV that did a "cat - >> /tmp/pipetest" Now your phantom just needs to print to that printer. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 4:16 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe The only thing about a pipe is that once it's closed, I believe it has to be re-opened by both Ends again. So if point a opens one end, and point b opens the other end, once either end closes, It closes for both sides, and both sides would have to reopen again to use. To eliminate this, you could have one end open a file, and have the other sides do a ">>" append To that file - just make sure you include some kind of dataheader so the reading side knows which Process just wrote the data. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of u2ug Sent: Monday, October 01, 2012 4:11 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe pipes -Original Message- From: u2-users-boun...@lis
Re: [U2] [u2] Parallel processing in Universe
Yes, SSD will definitely help. Just keep in mind, it doesn't prevent all negatives in regards to I/O, particularly with regards to caching. Disk caching in a modern system is fairly complex, but at the high level it is not only done by the controller, but by the OS as well. So randomly flying around the disk still cause cache thrashing. :( -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Tuesday, October 02, 2012 11:19 AM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe Great point!! I think we can agree that 'spinning media latency' is the enemy and having phantoms increasing the 'head dance' can make things worse, not better! Many problems go away or become trivial as the spinning media trails to the sunset. I've advised customers that just moving 'code files' to a tiny SSD would likely increase overall system performance on Windows boxes. Just waiting until the price for Enterprise SSDs makes them a no-brainer... Until then, even small SSDs will help! -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Daniel McGrath Sent: Tuesday, October 02, 2012 12:05 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe You've highlighted one problem here. By having multiple processes accessing the disk in different locations, you destroy cache optimization and seek times. More phantoms = less performance. This assumes I/O is a bigger concern than CPU, which is generally the case. More phantoms = more communication, which also adds another overhead that reduces performance. Introducing more phantoms than CPU cores, you increase the amount of context switching, which ones again hurts your cache usage as well as adding bigger overheads on the CPU again. In short, except for very specific cases, increasing 'concurrency' through phantoms on a single machine is generally ill-advised, resulting in longer processing times, higher average system loads and worse yet, greater system complexity (and hence ways for things to break). As mentioned earlier, more system-level architectural changes (such as multiple machines, or at least, files storage on different disks/spindles for each process) are required if you want benefit from this sort of work. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Monday, October 01, 2012 4:47 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe OK - I was trying to create a 'smoother use' of the disk and 'read ahead' -- this example the disk would be chattering from the heads moving all over the place. I was trying to find a way to make this process more 'orderly' -- is there one? -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Robert Houben Sent: Monday, October 01, 2012 4:48 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe Create an index on a dict pointing at the first character of the key, and have each phantom take two digits. (0-1, 2-3, 4-5, 6-7, 8-9) -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: October-01-12 2:43 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe So how would a user 'chop up' a file for parallel processing? Ideally, if here was a Mod 10001 file (or whatever) it would seem like it would be 'ideal' to assign 2000 groups to 5 phantoms -- but I don't know how 'start a BASIC select at Group 2001 or 4001' ... -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 3:29 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe 0001: OPENSEQ "/tmp/pipetest" TO F.PIPE ELSE STOP "NO PIPE" 0002: LOOP 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE 0004:PRINT LINE 0005: REPEAT 0006: STOP 0007: END Although, not sure if you might need to sleep a litte between the READSEQ's ELSE and CONTINUE Might suck up cpu time when nothing is writing to the file. Then you could setup a printer in UV that did a "cat - >> /tmp/pipetest" Now your phantom just needs to print to that printer. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 4:16 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe The only thing about a pipe is that once it's closed, I believe it has to be re-opened by both Ends again. So if point a opens one end, and point b opens the other end,
Re: [U2] [u2] Parallel processing in Universe
The idea of the phantoms would be to read the file in order, not randomly, just inorder from five different starting points. So you should still get the benefit of some caching. -Original Message- From: Daniel McGrath To: U2 Users List Sent: Tue, Oct 2, 2012 10:32 am Subject: Re: [U2] [u2] Parallel processing in Universe Yes, SSD will definitely help. Just keep in mind, it doesn't prevent all negatives in regards to I/O, particularly with regards to caching. Disk caching in a modern system is fairly complex, but at the high level it is not only done by the controller, but by the OS as well. So randomly flying around the disk still cause cache thrashing. :( -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Tuesday, October 02, 2012 11:19 AM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe Great point!! I think we can agree that 'spinning media latency' is the enemy and having phantoms increasing the 'head dance' can make things worse, not better! Many problems go away or become trivial as the spinning media trails to the sunset. I've advised customers that just moving 'code files' to a tiny SSD would likely increase overall system performance on Windows boxes. Just waiting until the price for Enterprise SSDs makes them a no-brainer... Until then, even small SSDs will help! -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Daniel McGrath Sent: Tuesday, October 02, 2012 12:05 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe You've highlighted one problem here. By having multiple processes accessing the disk in different locations, you destroy cache optimization and seek times. More phantoms = less performance. This assumes I/O is a bigger concern than CPU, which is generally the case. More phantoms = more communication, which also adds another overhead that reduces performance. Introducing more phantoms than CPU cores, you increase the amount of context switching, which ones again hurts your cache usage as well as adding bigger overheads on the CPU again. In short, except for very specific cases, increasing 'concurrency' through phantoms on a single machine is generally ill-advised, resulting in longer processing times, higher average system loads and worse yet, greater system complexity (and hence ways for things to break). As mentioned earlier, more system-level architectural changes (such as multiple machines, or at least, files storage on different disks/spindles for each process) are required if you want benefit from this sort of work. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Monday, October 01, 2012 4:47 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe OK - I was trying to create a 'smoother use' of the disk and 'read ahead' -- this example the disk would be chattering from the heads moving all over the place. I was trying to find a way to make this process more 'orderly' -- is there one? -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Robert Houben Sent: Monday, October 01, 2012 4:48 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe Create an index on a dict pointing at the first character of the key, and have each phantom take two digits. (0-1, 2-3, 4-5, 6-7, 8-9) -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: October-01-12 2:43 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe So how would a user 'chop up' a file for parallel processing? Ideally, if here was a Mod 10001 file (or whatever) it would seem like it would be 'ideal' to assign 2000 groups to 5 phantoms -- but I don't know how 'start a BASIC select at Group 2001 or 4001' ... -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 3:29 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe 0001: OPENSEQ "/tmp/pipetest" TO F.PIPE ELSE STOP "NO PIPE" 0002: LOOP 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE 0004:PRINT LINE 0005: REPEAT 0006: STOP 0007: END Although, not sure if you might need to sleep a litte between the READSEQ's ELSE and CONTINUE Might suck up cpu time when nothing is writing to the file. Then you could setup a printer in UV that did a "cat - >> /tmp/pipetest" Now your phantom just needs to print to that printer. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-us
Re: [U2] [u2] Parallel processing in Universe
If 5 phantoms were running, and read in order but from 5 different starting points, the records would Essentially still be processed in a random order, if you were to layout the record ID's as they get Processed. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Tuesday, October 02, 2012 1:35 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] [u2] Parallel processing in Universe The idea of the phantoms would be to read the file in order, not randomly, just inorder from five different starting points. So you should still get the benefit of some caching. ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
The point of the caching concern is related to the read ahead, and you will still get some benefit from this, if your five phantoms are reading their *portion* of the file in order, which they should. -Original Message- From: George Gallen To: U2 Users List Sent: Tue, Oct 2, 2012 10:39 am Subject: Re: [U2] [u2] Parallel processing in Universe If 5 phantoms were running, and read in order but from 5 different starting points, the records would Essentially still be processed in a random order, if you were to layout the record ID's as they get Processed. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Tuesday, October 02, 2012 1:35 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] [u2] Parallel processing in Universe The idea of the phantoms would be to read the file in order, not randomly, just inorder from five different starting points. So you should still get the benefit of some caching. ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
OK. I See what your saying...I'll buy that. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Tuesday, October 02, 2012 1:42 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] [u2] Parallel processing in Universe The point of the caching concern is related to the read ahead, and you will still get some benefit from this, if your five phantoms are reading their *portion* of the file in order, which they should. -Original Message- From: George Gallen To: U2 Users List Sent: Tue, Oct 2, 2012 10:39 am Subject: Re: [U2] [u2] Parallel processing in Universe If 5 phantoms were running, and read in order but from 5 different starting points, the records would Essentially still be processed in a random order, if you were to layout the record ID's as they get Processed. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Tuesday, October 02, 2012 1:35 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] [u2] Parallel processing in Universe The idea of the phantoms would be to read the file in order, not randomly, just inorder from five different starting points. So you should still get the benefit of some caching. ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
Which was my question -- was there a way to 'jump to' a group or 'BASIC SELECT' with a'starting/ending' group -- so that again, 10001 moduo, one phantom does 'groups' 1-2000, next phantom does 'groups' 2001-4000 etc... But can't see that it's really possible without jumping through hoops that make it unattractive at best! At least on UniData! DW -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Tuesday, October 02, 2012 12:55 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe OK. I See what your saying...I'll buy that. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Tuesday, October 02, 2012 1:42 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] [u2] Parallel processing in Universe The point of the caching concern is related to the read ahead, and you will still get some benefit from this, if your five phantoms are reading their *portion* of the file in order, which they should. -Original Message- From: George Gallen To: U2 Users List Sent: Tue, Oct 2, 2012 10:39 am Subject: Re: [U2] [u2] Parallel processing in Universe If 5 phantoms were running, and read in order but from 5 different starting points, the records would Essentially still be processed in a random order, if you were to layout the record ID's as they get Processed. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Tuesday, October 02, 2012 1:35 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] [u2] Parallel processing in Universe The idea of the phantoms would be to read the file in order, not randomly, just inorder from five different starting points. So you should still get the benefit of some caching. ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
You may not need to know what *group* you are in per se, if you are willing to use the file stats record. You can determine from the last stats, how many records are in your file. Then your master program just reads the keys until it gets to the 50,000th key (or whatever), and then spawns a phantom, telling it with which key to start, and how many keys to process before it ends. Or maybe you don't need the stat file if Unidata has the @SELECTED to tell you how many keys there are -Original Message- From: David Wolverton To: 'U2 Users List' Sent: Tue, Oct 2, 2012 10:59 am Subject: Re: [U2] [u2] Parallel processing in Universe Which was my question -- was there a way to 'jump to' a group or 'BASIC SELECT' with a'starting/ending' group -- so that again, 10001 moduo, one phantom does 'groups' 1-2000, next phantom does 'groups' 2001-4000 etc... But can't see that it's really possible without jumping through hoops that make it unattractive at best! At least on UniData! DW -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Tuesday, October 02, 2012 12:55 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe OK. I See what your saying...I'll buy that. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Tuesday, October 02, 2012 1:42 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] [u2] Parallel processing in Universe The point of the caching concern is related to the read ahead, and you will still get some benefit from this, if your five phantoms are reading their *portion* of the file in order, which they should. -Original Message- From: George Gallen To: U2 Users List Sent: Tue, Oct 2, 2012 10:39 am Subject: Re: [U2] [u2] Parallel processing in Universe If 5 phantoms were running, and read in order but from 5 different starting points, the records would Essentially still be processed in a random order, if you were to layout the record ID's as they get Processed. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Tuesday, October 02, 2012 1:35 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] [u2] Parallel processing in Universe The idea of the phantoms would be to read the file in order, not randomly, just inorder from five different starting points. So you should still get the benefit of some caching. ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
Could also avoid the lock contention if each phantom had knowledge of the others, so "phantom 1" could only process @ID 1, 6, 11 etc., phantom 2 would do 2,7,12 & so on Of course, if you are operating with a select list, this already implies that you have processed the file once, so your "batch process" is actually a re-read, so in the absence of a suitable index, perhaps employing the "Drumheller trick" would be worth consideration Ross Ferris Stamina Software Visage > Better by Design! -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Wednesday, 3 October 2012 3:42 AM To: u2-users@listserver.u2ug.org Subject: Re: [U2] [u2] Parallel processing in Universe The point of the caching concern is related to the read ahead, and you will still get some benefit from this, if your five phantoms are reading their *portion* of the file in order, which they should. -Original Message- From: George Gallen To: U2 Users List Sent: Tue, Oct 2, 2012 10:39 am Subject: Re: [U2] [u2] Parallel processing in Universe If 5 phantoms were running, and read in order but from 5 different starting points, the records would Essentially still be processed in a random order, if you were to layout the record ID's as they get Processed. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Tuesday, October 02, 2012 1:35 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] [u2] Parallel processing in Universe The idea of the phantoms would be to read the file in order, not randomly, just inorder from five different starting points. So you should still get the benefit of some caching. ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] [u2] Parallel processing in Universe
Depends on what you call a "no brainer" --> to me, $4K for an 800Mb Intel 910 SSD seems "reasonable" for what you get (10x full drive writes every day for 5 years has the endurance angle covered IMHO - 400Gb is $2K if your database will fit) and by todays standards represents "reasonable" value. Not quite at the performance level of Fusion IO, but cheap enough to just about be "affordable" Ross Ferris Stamina Software Visage > Better by Design! -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Wednesday, 3 October 2012 3:19 AM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe Great point!! I think we can agree that 'spinning media latency' is the enemy and having phantoms increasing the 'head dance' can make things worse, not better! Many problems go away or become trivial as the spinning media trails to the sunset. I've advised customers that just moving 'code files' to a tiny SSD would likely increase overall system performance on Windows boxes. Just waiting until the price for Enterprise SSDs makes them a no-brainer... Until then, even small SSDs will help! -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Daniel McGrath Sent: Tuesday, October 02, 2012 12:05 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe You've highlighted one problem here. By having multiple processes accessing the disk in different locations, you destroy cache optimization and seek times. More phantoms = less performance. This assumes I/O is a bigger concern than CPU, which is generally the case. More phantoms = more communication, which also adds another overhead that reduces performance. Introducing more phantoms than CPU cores, you increase the amount of context switching, which ones again hurts your cache usage as well as adding bigger overheads on the CPU again. In short, except for very specific cases, increasing 'concurrency' through phantoms on a single machine is generally ill-advised, resulting in longer processing times, higher average system loads and worse yet, greater system complexity (and hence ways for things to break). As mentioned earlier, more system-level architectural changes (such as multiple machines, or at least, files storage on different disks/spindles for each process) are required if you want benefit from this sort of work. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: Monday, October 01, 2012 4:47 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe OK - I was trying to create a 'smoother use' of the disk and 'read ahead' -- this example the disk would be chattering from the heads moving all over the place. I was trying to find a way to make this process more 'orderly' -- is there one? -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Robert Houben Sent: Monday, October 01, 2012 4:48 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe Create an index on a dict pointing at the first character of the key, and have each phantom take two digits. (0-1, 2-3, 4-5, 6-7, 8-9) -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of David Wolverton Sent: October-01-12 2:43 PM To: 'U2 Users List' Subject: Re: [U2] [u2] Parallel processing in Universe So how would a user 'chop up' a file for parallel processing? Ideally, if here was a Mod 10001 file (or whatever) it would seem like it would be 'ideal' to assign 2000 groups to 5 phantoms -- but I don't know how 'start a BASIC select at Group 2001 or 4001' ... -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 3:29 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe 0001: OPENSEQ "/tmp/pipetest" TO F.PIPE ELSE STOP "NO PIPE" 0002: LOOP 0003:READSEQ LINE FROM F.PIPE ELSE CONTINUE 0004:PRINT LINE 0005: REPEAT 0006: STOP 0007: END Although, not sure if you might need to sleep a litte between the READSEQ's ELSE and CONTINUE Might suck up cpu time when nothing is writing to the file. Then you could setup a printer in UV that did a "cat - >> /tmp/pipetest" Now your phantom just needs to print to that printer. George -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of George Gallen Sent: Monday, October 01, 2012 4:16 PM To: U2 Users List Subject: Re: [U2] [u2] Parallel processing in Universe The only thing about a pipe is that once it's closed, I be
Re: [U2] [u2] Parallel processing in Universe
On Tue, Oct 2, 2012 at 5:58 PM, Symeon Breen wrote: > However map reduce and hadoop are pretty horrible things. Even Google have > moved away from it with Caffiene etc. > Going OT a little, i think Google is replacing "BigTable" which was part of Caffeine in 2010 with "Spanner" now. Here is a doc about it released last month: http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/spanner-osdi2012.pdf ...amusingly they call it a "Multi-Version Database", can't wait till that gets abbreviated. - Robert ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
[U2] What remote support product do YOU use?
Just posted this to the pick/multivalue Google group, but figure there may be some (larger?) U2 only people who may have valuable insight on the subject, so We have been using TeamViewer (www.teamviewer.com) for the past 18 months or so, and I'm generally very happy with it - I can access Windows, Linux & apple hosts from my desktop, or even my iPhone (screen is too small for remote support so will be upgrading to a GalaxyNote 2 Phablet soon, so will have a chance to try out the Android client), and would be happy to recommend others looking for a solution (quote coupon code 95051-42-600991 to get a 3% discount) We had previously used LogMeIn, RDP for server access, and/or even VNC if we had VPN access, but TeamViewer just works for us. Licencing is one off, based on number of people our end that will be running concurrent sessions to clients, rather than paying per client system, and I keep toying with the idea of integrating with Visage as a "Support" button. We also use TeamViewer to run presentations, though I tend to use Skype rather than integrated VOIP capabilities That said, I'm also curious if anyone has a better product they have used (first hand, rather than just something they have read about which sounded good), and/or especially if you had previously had a commercial TeamViewer licence & moved on. Hoping for confirmation of our choice, but happy to have my horizons expanded :-) Ross Ferris Stamina Software Visage > Better by Design! ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users