Google Architecture
It appears that Google architecture is the antithesis of conventional mainframe application achitecture in all aspects. http://labs.google.com/papers/googlecluster-ieee.pdf -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Google Architecture
Just a shame it doesn't actually work. Ask any webmaster. Or check out the discussion groups on WebMasterWorld. Start at http://www.webmasterworld.com/forum30/ and read. Disaster after disaster - no search integrity at all. http://www.webmasterworld.com/forum30/34588.htm too - "Big Daddy" is a disaster that Google is desperately trying to out from under. Simple test - do a search on Google. Any search. Let it default to 10 hits per page, and collect all the pages. Then repeat the search, asking for 100 hits per page. Compare the results. They will be different. What they don't tell you is that each of your 10-per-page searches might be served by a different data center, using different indices and different databases. Google is a sham, as perusal of the sources above will rapidly show. It succeeds because the mass of people trust what computers produce and there are no checks or balances. Internet search engines are the largest unregulated aspect of human activity. And Google's database(s) are also hopelessly out of date. I have logs from my web site that show Google (despite its "sitemaps" programme) simply hasn't spidered changed pages for weeks. MSN, Yahoo, Ask (and even IBM Almaden) visit much more frequently. If any serious CMG-type person did a rigorous analysis of Google and published it, the stock price would crater. -- Phil Payne http://www.isham-research.co.uk +44 7833 654 800 -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Google Architecture
Google's solution is simply not scaling. Period. Check out the complaints of massive page loss both on 28th March and 26th April. A lot of people have been suggesting that Google might move to "mainframes" - although they don't seem to mean zSeries. Perhaps a POWER or BladeServer solution. Perhaps Superdome. Someone from IBM should be talking to these people. And probably is. As for the currency of Google's results - they're ordure (avoiding too many netnanny bounces). I can prove it - I have the logs and I've posted details in many of the webmaster forums. The most intensive spiderer (?) at present is Yahoo, by a country mile. The most up to date index is MSN - no doubt whatsoever about it. Ask is pretty well up there too. Google is MILES behind on both currency and content. Search Usenet for "the Google dance". Check out http://google.fergusons.dk every now and then. (P.S. Object REXX is just GREAT for web server log analysis. Perhaps SAS would be better, but .) -- Phil Payne http://www.isham-research.co.uk +44 7833 654 800 -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Google Architecture
I tend to retain my effusive moments for things that work. http://www.webmasterworld.com/forum30/34984.htm The latest of many "Google Datacenter" threads. I don't care about Google's stock-supporting p/r spin and the amount of mutual back-slapping they go in for - their system just doesn't work. You can read in the cited thread, in the rest of that forum, or all over Usenet about the "Google dance" - a generic term Check out the site http://oy-oy.eu/pages/ - which compares the indices in 39 disparate Google data centers. It stabilised sometime this morning - but you will sometimes see three-to-one discrepancies across the system. -- Phil Payne http://www.isham-research.co.uk +44 7833 654 800 -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
On 01/06/06, Bill Richter <[EMAIL PROTECTED]> wrote: It appears that Google architecture is the antithesis of conventional mainframe application achitecture in all aspects. http://labs.google.com/papers/googlecluster-ieee.pdf You may find this an interesting readd too... http://labs.google.com/papers/gfs-sosp2003.pdf -- Steve Despair - It's always darkest just before it goes pitch black... -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
> It appears that Google architecture is the antithesis of conventional > mainframe application achitecture in all aspects. > > http://labs.google.com/papers/googlecluster-ieee.pdf > Yup. That's what I've been telling you guys for the last couple of years. There are other approaches out there that are becoming pretty successful. I would not want to build a banking application that way, but for "drive-by" browsing where the integrity/reproducibility of the results isn't so important, theirs is the way to go. CC -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
Acutally financial query might use this architecture, except that the significant difference between this workload and "typical commercial" workloads -- they do no updates. Therefore no data integrity issues. Even a bankcard authorization requires the recording in a reliable way of the details of the query. And if the transaction completes, the further recording of the details attendant to the completion. Another bonus Google has is that they say they write all their own software. If that includes a modifed or proprietary opsys, no need to wory about virues, etc. Reminds me of the tradeoffs that used to tilt the balance between off-the-shelf electronics and ASICs. There is a subtext here. If their business is so unique, new entrants have a significant barrier to entry. New entrants will either have a significant upfront development cost prior to initiation of the revenue stream or will have to develop on off-the-shelf hardware/software and migrate to specialization. What I would like to see is the paper where Google talks about how to beat the "garbage-in/garbage-out" syndrome that generates a million hits on a query, most of which are commercial sites, most of which are porn sites. IBM Mainframe Discussion List wrote on 06/01/2006 11:36:43 AM: > > It appears that Google architecture is the antithesis of conventional > > mainframe application achitecture in all aspects. > > http://labs.google.com/papers/googlecluster-ieee.pdf > Yup. That's what I've been telling you guys for the last couple of > years. There are other approaches out there that are becoming pretty > successful. I would not want to build a banking application that way, > but for "drive-by" browsing where the integrity/reproducibility of the > results isn't so important, theirs is the way to go. - The information contained in this communication (including any attachments hereto) is confidential and is intended solely for the personal and confidential use of the individual or entity to whom it is addressed. The information may also constitute a legally privileged confidential communication. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this communication in error and that any review, dissemination, copying, or unauthorized use of this information, or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail, and delete the original message. Thank you -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
Bill Richter wrote: It appears that Google architecture is the antithesis of conventional mainframe application achitecture in all aspects. http://labs.google.com/papers/googlecluster-ieee.pdf and the difference between that and loosely-coupled or parallel sysplex? long ago and far away, my wife was con'ed to going to POK to be in charge of loosely-coupled architecture ... she was in the same organization with the guy in charge of tightly-coupled architecture. while she had come up with peer-coupled shared data architecture http://www.garlic.com/~lynn/subtopic.html#shareddata it was tough slogging because all the attention was focused on tightly-coupled architecture at the time. also she had battles with the sna forces ... who wanted control of all communication that left the processor complex (i.e. outside of direct disk i/o, etc). part of the problem was that in the early days of SNA ... she had co-authored a "peer-to-peer" network architecture with Bert Moldow ... AWP39 (somewhat viewed in competition with sna). while SNA was tailored for centralized control of a large number of dumb terminals ... it was decidedly lacking in doing peer-to-peer operations with large numbers of intelligent peers. a trivial example was sjr had done cluster 4341 implementation used highly optimized peer-to-peer protocols running over a slightly modified trotter/3088 (i.e. eventually came out as conventional ctca ... but with interconnection for eight processors/channels). peer-to-peer, asynchronous could achieve cluster synchronization in under a second elapsed time (for eight processors). doing the same thing with SNA increased the elapsed time to approx. a minute. the group was forced to only release the SNA-based implementation to customers ... which obviously had severe scaling properties as the numbers in a cluster increased. the communication division did help with significant uptake of PCs in the commercial environment. a customer could replace a dumb 327x with a PC for approx. the same price, get datacenter terminal emulation connectivity and in the same desktop footprint also have some local computing capability. as a result, you also found the communication group with a large install base of products in the terminal emulation market segment (with tens of millions of emulated dumb terminals) http://www.garlic.com/~lynn/subnetwork.html#emulation in the late 80s, we had come up with 3-tier architecture (as an extension to 2-tier, client/server) and were out pitching it to customer executives. however, the communication group had come up with SAA which was oriented trying to stem the tide moving to peer-to-peer networking, client/server, and away from dumb terminals. as a result, we tended to take a lot of heat from the SAA forces. http://www.garlic.com/~lynn/subnetwork.html#3tier in the same time frame, a senior engineer from the disk group in san jose managed to sneek a talk into the internal, annual world-wide communication conference. he began his talk with the statement that the communication group was going to be responsible for the demise of the disk division. basically the disk division had been coming up with all sorts of high-thruput, peer-to-peer network capability for PCs and workstations to access the datacenter mainframe disk farms. the communication was constantly opposing the efforts, protecting the installed base of terminal emulation products. recent reference to that talk: http://www.garlic.com/~lynn/2006k.html#25 Can anythink kill x86-64? i had started the high-speed data transport project in the early 80s ... hsdt http://www.garlic.com/~lynn/subnetwork.html#hsdt and had a number of T1 (1.5mbit) and higher speed links for various high-speed backbone applications. one friday, somebody in the communication group started an internal discussion on high-speed communication with some definitions ... recent posting referencing this http://www.garlic.com/~lynn/2006e.html#36 low-speed <9.6kbits medium-speed19.2kbits high-speed 56kbits very high-speed 1.5mbits the following monday, i was in the far-east talking about purchasing some hardware and they had the following definitions on their conference room wall low-speed >20mbits medium-speed100mbits high-speed 200-300mbits very high-speed >600mbits part of this was the communication division 37xx product line only supported up to 56kbit links. They had recently done a study to determine if T1 support was required ... which concluded that in 8-10 years there would only be 200 mainframe customers requiring T1 communication support. The issue could have been that the people doing the study were suppose to come up with the results supporting the current product line ... or maybe they didn't understand the evolving
Re: Google Architecture
Lynn Wheeler wrote: > > > http://labs.google.com/papers/googlecluster-ieee.pdf > > and the difference between that and loosely-coupled or parallel sysplex? GOOGLE is certainly a loosely coupled architecture, but as you of all people would know, there are significant differences between that and a parallel sysplex. The main feature they (and Amazon as well btw) focus on is the full burdened price of their computational units including power, cooling, footprint etc. and that makes economic sense for them given the nature of their business application. In sysplex the computational unit is an LPAR and even with the most wildly optimistic assumptions, any LPAR in a parallel sysplex is orders of magnitude more expensive than any commodity PC. Totaling up the full burden for an LPAR only puts that comparison further out of the park. The economics of system design changes in fundamental ways when the computational units approach a throw-away price level. LPAR, z/OS and sysplex have extremely clever and successful, but complex processing for RAS. z/OS has a healthy fraction of its code devoted to isolating and recovering from failures at every level from an individual task all the way up to sparing an LPAR out of the plex and driving recovery on other systems. The GOOGLE folks just accept that many things are going to be broken a lot of the time, so a throw-away mentality is not so whacky in their world. It really is possible to build highly reliable systems from commodity parts if you're prepared to write a lot of smart software. What has not happened (yet) is productization of that stuff. When that does come, "look out!" CC -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
[EMAIL PROTECTED] (Craddock, Chris) writes: GOOGLE is certainly a loosely coupled architecture, but as you of all people would know, there are significant differences between that and a parallel sysplex. The main feature they (and Amazon as well btw) focus on is the full burdened price of their computational units including power, cooling, footprint etc. and that makes economic sense for them given the nature of their business application. so the issue is effectively how fast fault isolation/recovery/tolerant technology becomes commodized. this is somewhat the scenario that happened with RAID ... when they first appeared, they were frequently depreciated compared to "mainframe" DASD ... but since then, they've effectively turned into the standard. for a little drift, i've repeated several times before what I did for i/o supervisor for the dasd engineering and product test labs (bldg. 14 & bldg 15) http://www.garlic.com/~lynn/subtopic.html#disk they had "testcells" ... basically hardware under development ... the term testcells somewhat come from the security provisions ... the test hardware were in individual heavy steel mesh "cages" (testing cells) ... inside a secured machine room. they had tried doing testing in an operating system environment ... but at the time, MVS had a MTBF of 15 mins operating with a single testcell. i undertook to rewrite the i/o supervisor so that it would never fail ... even when operating half-dozen to a dozen testcells concurrently allowing the processor complex then to also be used for some number of other tasks concurrently. bldg 14/15 tended to get early engineering models of processors ... also as part of disk testing. however, in the "stand-alone" mode of operation ... the processors were dedicated to scheduled i/o testing (which tended to be less than one percent cpu utilization). with the bullet proof i/o supervisor ... the idle cpu could be put to other tasks. at one point, bldg. 15 got the first engineering 3033 (outside of POK) dedicated for disk i/o testing. however, once we had testing going on in an operating system environment, we could take advantage of essentially, an otherwise idle processor. one of the applications that we moved onto the machine was the air bearing modeling that was going on as part of the development of the 3380 floating heads. SJR had a 370/195 that was being operated as an internal service ... and the air bearing modeling might get an hour or so a couple times a month. however, with essentially an idle 3033 sitting across the street ... we could drastically improve that (the 370/195 was peak rated around 10mips ... but most codes ran in the 5mip range ... and the 3033 was in the 4.5mip range ... but essentially unlimited amounts of 3033 time was still better than a couple hrs of 370/195 time a couple times a month). -- Anne & Lynn Wheeler | http://www.garlic.com/~lynn/ -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
[EMAIL PROTECTED] writes: Acutally financial query might use this architecture, except that the significant difference between this workload and "typical commercial" workloads -- they do no updates. Therefore no data integrity issues. re: http://www.garlic.com/~lynn/2006l.html#4 Google Architecture http://www.garlic.com/~lynn/2006l.html#6 Google Architecture we took some amount of heat in the 80s from the communication group working on high-speed data transport http://www.garlic.com/~lynn/subnetwork.html#hsdt and 3-tier architecture (as extension of 2-tier, client/server) http://www.garlic.com/~lynn/subnetwork.html#3tier then in the early 90s ... when we were working on scaling non-mainframe loosely-coupled for the commercial market http://www.garlic.com/~lynn/subtopic.html#hacmp we got hit and told we couldn't work on anything involving more than four processors ... minor reference: http://www.garlic.com/~lynn/95.html#13 however, the cluster scaling has evolved in a number of ways. high-energy physics picked it up and evolved it as something called GRID. a number of vendors also contributed a lot of work on GRID technology and since are out pushing it in various commercial market segments ... including financial. some of the early financial adopters are using GRID for doing complex financial analysis in real-time. some topic drift ... i gave a talk a couple years ago at the global grid forum https://forge.gridforum.org/docman2/ViewCategory.php?group_id=42&category_id=721 select "GGF11-desing-security-nakedkey" in the above. misc. GRID related news article in the commercial market Investment Banks Using Grid Computing Models http://www.epaynews.com/index.cgi?survey=&ref=browse&f=view&id=1148478974861413176&block= ASPEED Taking Financial Grids to the Next Level http://www.gridtoday.com/grid/673718.html Wachovia uses grid technology to speed up transaction apps http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9000476 Grid Computing That Heals Itself http://www.enterpriseitplanet.com/networking/news/article.php/3606041 GRID WORLD 2006: New IBM software brings autonomic computing to Grids http://www.enterprisenetworksandservers.com/newsflash/art.php?589 as somewhat referenced in a couple of the above ("batch processing going back 50 years")... bringing "batch" to GRID can be somewhat viewed as JES3 on steroids. before getting con'ed into going to pok to be in charge of loosely coupled architecture http://www.garlic.com/~lynn/subtopic.html#shareddata my wife had been in the JES group in g'burg. She had been one of the catchers for ASP ... as part of its transformation into JES3. She had also done a business analysis of the major JES2 and JES3 features as part of proposal for creating a merged product. however, that never made it very far ... in part because of a lot of internal politics. random past posts mentioning jes3: http://www.garlic.com/~lynn/2000.html#13 Computer of the century http://www.garlic.com/~lynn/2000.html#76 Mainframe operating systems http://www.garlic.com/~lynn/2000.html#78 Mainframe operating systems http://www.garlic.com/~lynn/2000f.html#30 OT? http://www.garlic.com/~lynn/2000f.html#37 OT? http://www.garlic.com/~lynn/2001b.html#73 7090 vs. 7094 etc. http://www.garlic.com/~lynn/2001c.html#69 Wheeler and Wheeler http://www.garlic.com/~lynn/2001g.html#44 The Alpha/IA64 Hybrid http://www.garlic.com/~lynn/2001g.html#46 The Alpha/IA64 Hybrid http://www.garlic.com/~lynn/2001g.html#48 The Alpha/IA64 Hybrid http://www.garlic.com/~lynn/2001n.html#11 OCO http://www.garlic.com/~lynn/2002e.html#25 Crazy idea: has it been done? http://www.garlic.com/~lynn/2002k.html#48 MVS 3.8J and NJE via CTC http://www.garlic.com/~lynn/2002n.html#58 IBM S/370-168, 195, and 3033 http://www.garlic.com/~lynn/2002q.html#31 Collating on the S/360-2540 card reader? http://www.garlic.com/~lynn/2002q.html#35 HASP: http://www.garlic.com/~lynn/2004b.html#53 origin of the UNIX dd command http://www.garlic.com/~lynn/2004c.html#6 If the x86 ISA could be redone http://www.garlic.com/~lynn/2004e.html#51 Infiniband - practicalities for small clusters http://www.garlic.com/~lynn/2004g.html#39 spool http://www.garlic.com/~lynn/2004o.html#32 What system Release do you use... OS390? z/os? I'm a Vendor S http://www.garlic.com/~lynn/2005o.html#39 JES unification project http://www.garlic.com/~lynn/2005p.html#44 hasp, jes, rasp, aspen, gold http://www.garlic.com/~lynn/2005p.html#45 HASP/ASP JES/JES2/JES3 http://www.garlic.com/~lynn/2005q.html#0 HASP/ASP JES/JES2/JES3 http://www.garlic.com/~lynn/2005q.html#7 HASP/ASP JES/JES2/JES3 http://www.garlic.com/~lynn/2005q.html#15 HASP/ASP JES/JES2/JES3 http://www.garlic.com/~lynn/2005q.html#16 HASP/ASP JES/JES2/JES3 http://www.garlic.com/~lynn/2005q.html#19 HASP/ASP JES/JES2/JES3 http://www.garlic.com/~lynn/2005q.html#30 HASP/ASP JES/JES2/JES3 http://w
Re: Google Architecture
Anne & Lynn Wheeler wrote: however, the cluster scaling has evolved in a number of ways. high-energy physics picked it up and evolved it as something called GRID. a number of vendors also contributed a lot of work on GRID technology and since are out pushing it in various commercial market segments ... including financial. some of the early financial adopters are using GRID for doing complex financial analysis in real-time. re: http://www.garlic.com/~lynn/2006l.html#4 Google Architecture http://www.garlic.com/~lynn/2006l.html#6 Google Architecture http://www.garlic.com/~lynn/2006l.html#7 Google Architecture recent news article from yesterday Cern seeks to tighten security for data grid http://www.vnunet.com/computing/news/2157258/cern-seeks-tighten-security from above: Although large data grids are only starting to be used in business, Cern is seeing a lot of interest from industry. The lab is developing grids that will reach across organisational boundaries, allowing multiple institutions to share resources. ‘Businesses are now becoming interested in this kind of grid,’ said Grey. ‘Its use could enable suppliers and companies to share resources and large corporations to share information between business units. Grid technology will only be adopted if the right type of security solutions are available.’ ... snip ... other references: http://www-128.ibm.com/developerworks/library/gr-watch1.html http://www.alphaworks.ibm.com/tech/optimalgrid http://www-128.ibm.com/developerworks/grid http://www.gridcomputingplanet.com/news/article.php/3281_1480781 http://www.ggf.org/UnderstandingGrids/ggf_grid_und http://www.semanticgrid.org/GGF/erstand.php http://gridcafe.web.cern.ch/gridcafe/ -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
Google's CEO has made some interesting comments recently about their current IT architecture, its viability, and its costs: http://www.iht.com/articles/2006/04/21/business/GOOGLE.php Here's the section of particular relevance: Google continued to make substantial capital investments, mainly in computer servers, networking equipment and its data centers. It spent $345 million on such items in the first quarter, more than double the level of last year. Yahoo, its closest rival, spent $142 million on capital expenses in the first quarter. Referring to the sheer volume of Web site information, video and e-mail that Google's servers hold, Schmidt said: "Those machines are full. We have a huge machine crisis." Jordan Rohan of RBC Capital Markets called Google's capital spending "unfathomably high," noting that it spent the same percentage of its revenue on equipment as a wire-line phone company. "If Google's market share continues to increase, and its position as the central hub of the Internet is reinforced, an extra $1 billion is a worthwhile investment," Rohan said. "The day market share peaks, we have a problem." - - - - - $345 million in capital expenses alone (excluding the tall operating expenses for that pile). In one quarter. Good grief. Those cheap servers aren't so cheap. Also, Google's service availability is bad and seems to be getting worse. (Blogger is a mess.) Google's CEO sounds like he's starting to understand that something has to change, to his credit. There are some alternative architectures out there. For example, how does Lexis-Nexis do their work? What's their service availability? http://www.lexisnexis.com/presscenter/mediakit/datacenter.asp [ Speaking for myself. ] - - - - - Timothy F. Sipples Consulting Enterprise Software Architect, z9/zSeries IBM Japan, Ltd. E-Mail: [EMAIL PROTECTED] -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
Lynn Wheeler wrote: > so the issue is effectively how fast fault isolation/recovery/tolerant > technology becomes commodized. this is somewhat the scenario that > happened with RAID ... when they first appeared, they were frequently > depreciated compared to "mainframe" DASD ... but since then, they've > effectively turned into the standard. I believe so. The parallel with the RAID experience is pretty striking and ought to be sobering. GOOGLE, Yahoo, Amazon and others of their ilk are really pushing the envelope with their focus on low-cost. That approach has taken them in some surprising directions in terms of their use of (data/function) duplication and redundancy and the associated impacts on traditional OLTP thinking. Of course, if you're indexing the entire internet those low-cost thingies can still add up to big numbers, but it would be a mistake (IMO) to assume we could to it any cheaper/more efficiently with a traditional mainframe-based system design. It might be possible, but there are reasons to think it might not go well for the traditionalists. Pat Helland (formerly with Tandem and MS, now with Amazon) has written some very lucid and entertaining discussions about how economics are changing their system design points. He was one of the originators of the Tandem Non-Stop transaction system and a life-long transaction processing bigot. Now he's talking openly about his ACID apostasy. If Pat is ready to cast that aside, I think everyone else ought to at least take it seriously. At present the big 3 online companies are big enough and cash rich enough that they can afford a lot of awfully smart people developing these things for their own internal use in their online stores and search engines. I am wondering what will happen if/when they decide to commoditize and sell their technology on the open market. Although I know Pat pretty well, I have zero inside knowledge on whether such a thing is even being contemplated. It is just my own idle speculation based on observation of the potential market upside. CC -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
Craddock, Chris wrote: Pat Helland (formerly with Tandem and MS, now with Amazon) has written some very lucid and entertaining discussions about how economics are changing their system design points. He was one of the originators of the Tandem Non-Stop transaction system and a life-long transaction processing bigot. Now he's talking openly about his ACID apostasy. If Pat is ready to cast that aside, I think everyone else ought to at least take it seriously. a lot of the ACID (and TPC) stuff originated with Jim. When Jim left system/r group (original relational/sql implementation): http://www.garlic.com/~lynn/subtopic.html#systemr and went to tandem, we would frequently drop by and visit him. In fact, I got blaimed for something called tandem memos ... couple posts with minor refs: http://www.garlic.com/~lynn/2005c.html#50 http://www.garlic.com/~lynn/2006h.html#9 Later when we were doing ha/cmp (on non-mainframe platform) http://www.garlic.com/~lynn/subtopic.html#hacmp and out preaching availability and scale-up on commodity priced hardware http://www.garlic.com/~lynn/95.html#13 Jim and I had some disagreements ... of course he was with DEC at the time, and they were pitching vax/clusters. Of course, later he was up on the stage announcing commodity priced clusters for scale-up and availability. for other drift ... a recent thread discussing some vax/vms and mid-range mainframe market somewhat from late 70s through later 80s. http://www.garlic.com/~lynn/2006l.html#17 http://www.garlic.com/~lynn/2006l.html#18 http://www.garlic.com/~lynn/2006l.html#19 -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
The only thing I can complement is there is not better engine than Google, is there ? So, I'm going to keep using google, until find something better. Can be mainframe based if you want. Or audi (car) based, I don't care. BTW: outdated pages are quite useful somtimes. I found the information which was already deleted from original page. I can use oudated copy. I like it. Usually such pages are far from first three hits. -- Radoslaw Skorupka Lodz, Poland -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
On Tuesday 06 June 2006 08:56, Phil Payne wrote: > ... The most intensive spiderer (?) at present is Yahoo, ... Perhaps, but quantity isn't synonym with quality. From what I see in my Web server's logs, Yahoo! "stutters" a lot, i.e. it reads the same page two, three or four times a day. And it takes weeks to update its index, at least for pages on my site. > The most up to date index is MSN - no doubt whatsoever about it. True, but you have to put things in perspective. MSN Search has only been in business for a little over a year, and they don't have to carry ten years of baggage like Google and Yahoo. Furthermore, MSN Search very often produces a lot fewer hits than Google or Yahoo. As MSN Search matures, expect them to have the same performance, accuracy and reliability problems as Yahoo and Google have today. -- Gilbert Saint-Flour GSF Software http://gsf-soft.com/ mailto:[EMAIL PROTECTED] -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
I think you're applying the wrong criteria. If I go to my bank's Web site and ask to see all of my accounts, I want consistent (and consistently correct) results. But when I do a Web search, I just want relevant results - in fact there is no "correct" so why should I care about consistent? Why should I care if I do the same Google search twice and the number eight and nine results are different? Or even if the first and second hits are different? I care that the top ten or so pages are "good" results. It would be nice to think that they are the best of all possible results, but that's impossible - the "best" page may have been published fifteen minutes ago. You're looking at a race car and complaining that it wouldn't be very good at pulling a plow. Google is not a financial tool that has to balance at the end of the day. It's a search tool. If you search for something twice (in your sock drawer, your bookshelf, your physical desktop, or on Google) you're likely to get different results. If you find what you're looking for, you're still going to be happy. Google-bashing is the new Microsoft-bashing (which in turn was the new IBM-bashing). Charles -Original Message- From: IBM Mainframe Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of Phil Payne Sent: Tuesday, June 06, 2006 2:04 AM To: IBM-MAIN@BAMA.UA.EDU Subject: Google Architecture Just a shame it doesn't actually work. Ask any webmaster. Or check out the discussion groups on WebMasterWorld. Start at http://www.webmasterworld.com/forum30/ and read. Disaster after disaster - no search integrity at all. http://www.webmasterworld.com/forum30/34588.htm too - "Big Daddy" is a disaster that Google is desperately trying to out from under. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
A slightly different spin. Perhaps the model for Google and the other mega-search engines actually represents a niche market: the gleaning and somewhat orderly presentation of random atoms of the world wide web. That's a gargantuan 'niche' in topographic scope, but narrow in this respect: a query only needs to get back an answer that satisfies the asker within the first few dozen guesses. I'm always amazed when Google satisfies me within the first handful of guesses, but I don't demand instant gratification. I'll browse through a screen or two, then try to reformulate the query. I can quickly skip over most off-base guesses, and I tolerate repetitive guesses that show up along the way. But since each query costs me nothing but time, I can afford to circle the wagons and chip away at the chaos until some semblance of intelligence appears. Furthermore, my motivation for asking is often recreational: scratching the curiosity itch or settling a friendly dispute. I don't gamble, so front line profit/loss is seldom a factor. Meanwhile, Google makes money not from the answers they shovel my way but from the advertizers who pay to peddle their wares on page displays. The more times I reformulate an unsatisfied question, the more hits Google racks up. This is not a conspiracy, just the way it works. I keep at it because Google usually delivers. Eventually. I'm induced to persevere because every query costs me the same: nothing. Contrast this model to a traditional business enterprise I once worked at. (Then) TRW Business Credit service collected accounts receivable data from all manner of suppliers large and small. A potential creditor would query the financial soundness a new customer by pulling a report of how well the buyer was paying bills to other suppliers. Standard inquiry stuff, but the wrinkle was the complexity of matching queries with account records in the data base. ABC Pipe Fitters of Duluth may sound like simple look-up, but subtle variations in name, address, and even supposedly unique entity numbers made gathering up the 'guesses' a dicey proposition. At stake here was not cocktail party one-upmanship but serious matters of money and reputation. If a report listed erroneous matches, the fallout could be severe: failure to note payment problems or successes could cause nasty results either way. Same for failure to list true matches: news too rosy or unfairly bad. On top of all this uncertainly, the report was paid for by the requestor. Two reports cost twice as much as one. And two reports on the same business had better be identical or the requestor would be hopping unhappy. We employed several folks not then prominent in the IT industry to exercise and hone the search mechanism. It was a serious foray into fuzzy logic before I first read that term in print. And it all ran on MVS. Would the Google model have worked in that business environment? Not with the flaws pointed out in this thread. How about querying your bank or credit card online? Your utility usage? Your school records or the maintenance history of your car? In such environments, what you demand is a few good answers. So, other than the 'niche' market of roaming the web in search of stems and pieces of information ranging from titillating to useless, what real business would put up with Google's imprecision? Who could afford to? "Craddock, Chris" <[EMAIL PROTECTED]> Sent by: IBM Mainframe Discussion List 06/04/2006 01:47 PM Please respond to IBM Mainframe Discussion List To IBM-MAIN@BAMA.UA.EDU cc Subject Re: Google Architecture Of course, if you're indexing the entire internet those low-cost thingies can still add up to big numbers, but it would be a mistake (IMO) to assume we could to it any cheaper/more efficiently with a traditional mainframe-based system design. It might be possible, but there are reasons to think it might not go well for the traditionalists. CC -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
Phil Payne <[EMAIL PROTECTED]> wrote: >Simple test - do a search on Google. Any search. Let it default to 10 hits >per page, and >collect all the pages. >Then repeat the search, asking for 100 hits per page. >Compare the results. They will be different. What they don't tell you is >that each of your >10-per-page searches might be served by a different data center, using >different indices and >different databases. >Google is a sham, as perusal of the sources above will rapidly show. It >succeeds because the >mass of people trust what computers produce and there are no checks or >balances. Internet >search engines are the largest unregulated aspect of human activity. No, Google succeeds because "Good enough is good enough" (SM, me). It works well enough to satisfy end-users, so they use it. Yes, Big Daddy is a problem; yes, many webmasters are unhappy; but Google continues to work well enough to power the Internet economy (high-falutin' words, but, in my experience, NOT overblown). The complaints about varied results, stale pages, intermittent spidering, etc. go back to ... Well, forever. They're evidence of suboptimal-ness, not "failure" or "a sham". I'm not in love with Google, have no stake in them (wish I did!), but the vitriol heaped upon them is unreasonable. Google works, period. 10**n successful searches per day prove that empirically. ...phsiii -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
Perhaps a lot of the enmity (especially in this community) toward Google is their success and public perception (and possible IT perception) that they provide the ideal IT environment - they "prove" that the cheap, distributed server setup is viable, even though all they really provide is rather high availability. Since a lot of us have to provide similiar availability, with a lot better/more timely services, yes, I guess that can burn a bit (especially with all the crap that mainframes have to take about being outdated). Aaron -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
I will be interested in seeing whether Google can reproduce their search-engine success with gmail. When your selling point is that people can retain and search gigabytes of email, you can't get away with some of the things you can as a simple search-engine. As a new gmail user, I am so far underwhelmed. It is still a work in progress, though, so it should be interesting times (hopefully in a good way). Jon -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
Google is entering many more information service businesses than just their popular Internet search engine. Many of those other businesses do require consistency in results. Also, as Internet search matures, I think people will expect more consistency and currency. - - - - - Timothy F. Sipples Consulting Enterprise Software Architect, z9/zSeries IBM Japan, Ltd. E-Mail: [EMAIL PROTECTED] -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
I doubt it. All the major players in the webmail market have expanded their storage to match Gmail's. And Yahoo just blew the doors off nearly everyone with their new beta web client. It looks like Outlook webmail, but it's better and faster. The best web app I've ever used, and I"ve hated plenty. No - Gmail did not impress here either. "Jon Brock" <[EMAIL PROTECTED]> wrote in message news:<[EMAIL PROTECTED]>... > I will be interested in seeing whether Google can reproduce their > search-engine success with gmail. When your selling point is that people can > retain and search gigabytes of email, you can't get away with some of the > things you can as a simple search-engine. > > As a new gmail user, I am so far underwhelmed. It is still a work in > progress, though, so it should be interesting times (hopefully in a good way). > > Jon -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
oh and late breaking topic drift: Bank admits flaws in chip and PIN security http://www.dailymail.co.uk/pages/live/articles/news/news.html?in_article_id=385811&in_page_id=1770 Millions at risk from chip and Pin http://www.thisismoney.co.uk/saving-and-banking/article.html?in_article_id=409616&in_page_id=7 Millions in danger from chip and pin fraudsters http://www.dailymail.co.uk/pages/live/articles/news/news.html?in_article_id=389084&in_page_id=1770&in_a_source= UK Banks Expected To Move To DDA EMV Cards http://www.epaynews.com/index.cgi?survey=&ref=browse&f=view&id=11497625028614136145&block= and some comments: http://www.garlic.com/~lynn/aadsm24.htm#1 UK Detects Chip-And-PIN Security Flaw http://www.garlic.com/~lynn/aadsm24.htm#2 UK Banks Expected To Move To DDA EMV Cards -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
"The best guess is that Google now has more than 450,000 servers spread over at least 25 locations around the world." - http://www.nytimes.com/2006/06/14/technology/14search.html?hp&ex=1150344000&; en=25cfc1be85c1d603&ei=5094 watch the wrap Charles -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
And now they're planning a super-duper-supercomputer site as well... http://www.iht.com/articles/2006/06/13/business/search.php - Original Message From: Charles Mills <[EMAIL PROTECTED]> To: IBM-MAIN@BAMA.UA.EDU Sent: Wednesday, June 14, 2006 1:31:02 PM Subject: Re: Google Architecture "The best guess is that Google now has more than 450,000 servers spread over at least 25 locations around the world." - http://www.nytimes.com/2006/06/14/technology/14search.html?hp&ex=1150344000&;; en=25cfc1be85c1d603&ei=5094 watch the wrap Charles -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
100,000 foot view of GFS "GFS is not the future. But it shows us what the future can be." http://storagemojo.com/?page_id=152 http://storagemojo.com/?page_id=153 -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
"with the advent of Google Checkout, a heavy-duty TP application, the company must have one." "What architecture is Google using to provide high-performance, large-scale transaction processing?" http://storagemojo.com/?p=177 -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
Inside the Google-Plex http://it.slashdot.org/it/06/07/10/216249.shtml http://www.baselinemag.com/article2/0,1540,1985040,00.asp from above: Google runs on hundreds of thousands of servers—by one estimate, in excess of 450,000—racked up in thousands of clusters in dozens of data centers around the world. ... snip ... also .. How Google Works http://www.eweek.com/article2/0,1895,1985576,00.asp past refs: http://www.garlic.com/~lynn/2006l.html#4 Google Architecture http://www.garlic.com/~lynn/2006l.html#6 Google Architecture http://www.garlic.com/~lynn/2006l.html#7 Google Architecture http://www.garlic.com/~lynn/2006l.html#8 Google Architecture http://www.garlic.com/~lynn/2006l.html#24 Google Architecture http://www.garlic.com/~lynn/2006l.html#26 Google Architecture http://www.garlic.com/~lynn/2006l.html#27 Google Architecture http://www.garlic.com/~lynn/2006l.html#28 Google Architecture http://www.garlic.com/~lynn/2006l.html#31 Google Architecture http://www.garlic.com/~lynn/2006l.html#32 Google Architecture http://www.garlic.com/~lynn/2006l.html#33 Google Architecture http://www.garlic.com/~lynn/2006l.html#37 Google Architecture http://www.garlic.com/~lynn/2006m.html#43 Google Architecture -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
In a message dated 7/10/2006 11:39:32 P.M. Central Standard Time, [EMAIL PROTECTED] writes: excess of 450,000—racked up in thousands of clusters in dozens of data centers around the world. >> GIGO...google in, google out! Guess the amazing thing is they keep all the 'in' -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Google Architecture
Anne & Lynn Wheeler wrote: Google runs on hundreds of thousands of servers—by one estimate, in excess of 450,000—racked up in thousands of clusters in dozens of data centers around the world. re: http://www.garlic.com/~lynn/2006n.html#12 Google Architecture ... in somewhat similar vein Grid Is 'It' at eBay http://www.eweek.com/article2/0,1895,1995124,00.asp The dramatic growth and high exposure of eBay's Web presence make it a rare example of a grid computing platform and application portfolio that are well past the pilot-project stage. ... snip ... and for a little drift http://www.garlic.com/~lynn/95.html#13 -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Homoeoteleutera / Google Architecture / sequence numbers (or whatever)
In a recent note, john gilmore said: > Date: Sun, 30 Jul 2006 13:12:44 + > > My favorite---storm warning of a big word to come---is their notional > usefulness in avoiding homoeoteleutera; but others may well have their own, > different favorites. > Congratulations! homoeoteleutera - Google Search homoeoteleutera__ Search Advanced Search Did you mean: homoioteleuton No standard web pages containing all your search terms were found. Your search - homoeoteleutera - did not match any documents. ©2006 Google I'm trying to parse it as same/early/distant/good/??? with little success. "Homoioteleuton" could be somewhat relevant in that sequence numbers distinguish the endings of lines. -- gil -- StorageTek INFORMATION made POWERFUL -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Homoeoteleutera / Google Architecture / sequence numbers (or whatever)
In a message dated 7/30/2006 10:04:58 A.M. Central Standard Time, [EMAIL PROTECTED] writes: Did you mean: homoioteleuton >> More likely homoimaginus. Has anybody written a SHARE requirement for SEQ/NOSEQ in IEASYS? It would probably be on the order of Y2K compliance. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: Homoeoteleutera / Google Architecture / sequence numbers (or whatever)
In a message dated 7/30/2006 10:04:58 A.M. Central Daylight Time, [EMAIL PROTECTED] writes: >homoeoteleutera - Google Search >homoeoteleutera__ SearchAdvanced Search >Did you mean: homoioteleuton I did the same, clicked on Google's suggested alternate spelling, and was taken to a Wikipedia page on which the word was spelled "homo-" in the link address but consistently misspelled "home-" all through the article. I wonder now if there is a Greek-based word meaning "consistently misspelling a word everywhere but in the metadata that points to it." :-) The "-utera" change from "-uton" that Mr. Gilmore used is clearly nothing more than pluralizing the Greek neuter singular noun ending of "-on" into "-a", although I don't understand the insertion of the "-er". I guess. :-) But the other discrepancies may be a genuine misspelling. For shame! I wish Mr. Gilmore would pepper his erudite postings with obscure Latin-based words rather than Greek-based, as I studied Latin a lot more (4 years) than I did Greek (6 weeks). :-) Bottom line: I believe the Wikipedia article explains Mr. Gilmore's word, although its orthography is still uncertain. Bill Fairchild -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html