Re: [gentoo-user] PostgreSQL Vs MySQL @Uber

J. Roeleveld Mon, 01 Aug 2016 10:32:31 -0700

On Monday, August 01, 2016 11:01:28 AM Rich Freeman wrote:
> On Mon, Aug 1, 2016 at 3:16 AM, J. Roeleveld <jo...@antarean.org> wrote:
> > Check the link posted by Douglas.
> > Ubers article has some misunderstandings about the architecture with
> > conclusions drawn that are, at least also, caused by their database design
> > and usage.
> 
> I've read it.  I don't think it actually alleges any misunderstandings
> about the Postgres architecture, but rather that it doesn't perform as
> well in Uber's design.  I don't think it actually alleges that Uber's
> design is a bad one in any way.


It was written quite diplomatic. Seeing the create table for the sample tables 
already make me wonder how they designed their database schema. Especially 
from a performance point of view. But that is a seperate discussion :)

> But, I'm certainly interested in anything else that develops here...

Same here, and I am hoping some others will also come up with some interesting 
bits.

> >> And of course almost any FOSS project could have a bug.  I
> >> don't know if either project does the kind of regression testing to
> >> reliably detect this sort of issue.
> > 
> > Not sure either, I do think PostgreSQL does a lot with regression tests.
> 
> Obviously they missed that bug.  Of course, so did Uber in their
> internal testing.  I've seen a DB bug in production (granted, only one
> so far) and they aren't pretty.  A big issue for Uber is that their
> transaction rate and DB size is such that they really don't have a
> practical option of restoring backups.

>From the slides on their migration from MySQL to PostgreSQL in 2013, I see it 
took them 45 minutes to migrate 50GB of data.
To me, that seems like a very bad transfer-rate for, what I would consider, a 
dev environment. It's only about 20MB/s.
I've seen "bad performing" ETL processes reading from 300GB of XML files and 
loading that into 3 DB-tables within 1.5 hours. That's about 57MB/s.
With the XML-engine using up nearly 98% of the total CPU-load.

If the data would have been supplied in CSV-files, it would have been roughly 
100GB of data. This could be easily loaded within 20 minutes. Equalling to 
85MB/s. (Filling up the network bandwidth)

I think their database design and infrastructure isn't optimized for their 
specific work-load. Which is, unfortunately, quite common.

> Obviously they'd do that in a
> complete disaster, but short of that they can't really afford to do
> so.  By the time a backup is recorded it would be incredibly out of
> date.  They have the same issue with the lack of online upgrades
> (which the responding article doesn't really talk about).  They really
> need it to just work all the time.

When I migrate a Postgresql to a new major version, I migrate 1 database at a 
time to minimize downtime. This is done by piping the output of the backup-
process straight into a restore-proces connected to the new server.

If it were even more time-critical, I would develop a migration proces that 
would:
1) copy all the current (as in, needed today) to the new database
2) disable the application
3) copy all the latest changes for today to the new database
4) reenable the application (pointing to new database)
5) copy all the historical data I might need

I would add a note on the website and send out an email first informing the 
customers that the data is being migrated and historical data might be 
incomplete during this proces.

> >> I'd think that it is more likely
> >> that the likes of Oracle would (for their flagship DB (not for MySQL),
> > 
> > Never worked with Oracle (or other big software vendors), have you? :)
> 
> Actually, I almost exclusively work with them.  Some are better than
> others.  I don't work directly with Oracle, but I can say that the two
> times I've worked with an Oracle consultant they've been worth their
> weight in gold, and cost about as much.

They do have some good ones...

> The one was fixing some kind
> of RDB data corruption on a VAX that was easily a decade out of date
> at the time; I was shocked that they could find somebody who knew how
> to fix it.  interestingly, it looks like they only abandoned RDB
> recently.

Probably one of the few people in the world. And he/she might have been hired 
in by Oracle for this particular issue.

> They do tend to be a solution that involves throwing money at
> problems.  My employer was having issues with a database from another
> big software vendor which I'm sure was the result of bad application
> design, but throwing Exadata at it did solve the problem, at an
> astonishing price.

I was at Collaborate last year and spoke to some of the guys from Oracle. (Not 
going into specifics to protect their jobs). When asked if one of my customers 
should be using Oracle RAC or Exadata, the answer came down to: "If you think 
RAC might be sufficient, it usually is"

Exadata, however, is a really nice design. But throwing faster machines at a 
problem should only be part of the solution.
I know someone who claims he can make a "standard" Oracle database outperform 
an Exadata database. That claim is based on the (usually true) assumption that 
databases are not designed for performance.
Mind, if the same tricks would be done on an Exadata environment, you'd see 
phenominal performance.

> Neither my employer nor the big software provider
> in question is likely to attract top-notch DB talent (indeed, mine has
> steadily gotten rid of anybody who knows how to do anything in Oracle
> beyond creating schemas it seems,

Actively? Or by simply letting the good ones go while replacing them with 
someone less clued up?

> though I can only imagine how much
> they pay annually in their license fees; and yes, I'm sure 99.9% of
> what they use Oracle (or SQL Server) for would work just fine in
> Postgres).

That is my feeling as well. The problem is that the likes of Informatica (one 
of the leading ETL software vendors) don't actually support PostgreSQL. That 
is a bit of a downside. I'd need to use ODBC (yes, that also works on non-MS 
Windows) to connect.

> > Only if you're a big (as in, spend a lot of money with them) customer.
> 
> So, we are that (and I think a few of our IT execs used to be Oracle
> employees, which I'm sure isn't hurting their business). 

I actually didn't join Oracle. I did, however, used to work for one of the 
companies Oracle bought. I decided not to wait for the inevitable job cuts. In 
hindsight, that one wasn't too bad as they actually kept that part for nearly 
8 years.

> I'll admit
> that Uber might not get the same attention.  Seems like Oracle is the
> solution at work from everything to software that runs the entire
> company to software that hosts one table for 10 employees (well, when
> somebody notices and gets it out of Access).

Don't forget the Finance departments. They tend to use Excel files for 
everything.

> Well, unless it involves
> an MS-oriented dev or Sharepoint, in which case somebody inevitably
> wants it on SQL Server.  I did mention that we're not a world-class IT
> shop, didn't I?

I won't actually name companies, but I've seen plenty of big ones that would 
fit your description. So not sure what a "world-class" IT shop would look like 
when having to deal with the internal politics, bureaucracy and procedures 
that come as standard with big companies.

--
Joost

Re: [gentoo-user] PostgreSQL Vs MySQL @Uber

Reply via email to