View this post on the web at 
https://blog.bytebytego.com/p/how-paypal-serves-350-billion-daily

Stop releasing bugs with fully automated end-to-end test coverage (Sponsored) [ 
https://substack.com/redirect/42bea5da-03c2-4479-a7e3-2df9a9ec5af5?j=eyJ1IjoiMXBxZnM3In0.Bn7pYfJL44qVpLsyOfEHZjPX-L54Pv8N6hc-GQqMSDU
 ]
Bugs sneak out when less than 80% of user flows are tested [ 
https://substack.com/redirect/42bea5da-03c2-4479-a7e3-2df9a9ec5af5?j=eyJ1IjoiMXBxZnM3In0.Bn7pYfJL44qVpLsyOfEHZjPX-L54Pv8N6hc-GQqMSDU
 ] before shipping. But how do you get that kind of coverage? You either spend 
years scaling in-house QA — or you get there in just 4 months with QA Wolf [ 
https://substack.com/redirect/42bea5da-03c2-4479-a7e3-2df9a9ec5af5?j=eyJ1IjoiMXBxZnM3In0.Bn7pYfJL44qVpLsyOfEHZjPX-L54Pv8N6hc-GQqMSDU
 ].
How's QA Wolf different?
They don't charge hourly.
They guarantee results.
They provide all of the tooling and (parallel run) infrastructure needed to run 
a 15-minute QA cycle.
Have you ever seen a database that fails and comes up again in the blink of an 
eye?
PayPal’s JunoDB is a database capable of doing so. As per PayPal’s claim, 
JunoDB can run at 6 nines of availability (99.9999%). This comes to just 86.40 
milliseconds of downtime per day.
For reference, our average eye blink takes around 100-150 milliseconds.
While the statistics are certainly amazing, it also means that there are many 
interesting things to pick up from JunoDB’s architecture and design.
In this post, we will cover the following topics:
JunoDB’s Architecture Breakdown
How JunoDB achieves scalability, availability, performance, and security
Use cases of JunoDB
Key Facts about JunoDB
Before we go further, here are some key facts about JunoDB that can help us 
develop a better understanding of it.
JunoDB is a distributed key-value store. Think of a key-value store as a 
dictionary where you look up a word (the “key”) to find its definition (the 
“value”).
JunoDB leverages a highly concurrent architecture implemented in Go to 
efficiently handle hundreds of thousands of connections.
At PayPal, JunoDB serves almost 350 billion daily requests and is used in every 
core backend service, including critical functionalities like login, risk 
management, and transaction processing.
PayPal primarily uses JunoDB for caching to reduce the load on the main 
source-of-truth database. However, there are also other use cases that we will 
discuss in a later section.
The diagram shows how JunoDB fits into the overall scheme of things at PayPal.
Why the Need for JunoDB?
One common question surrounding the creation of something like JunoDB is this:
“Why couldn’t PayPal just use something off-the-shelf like Redis?”
The reason is PayPal wanted multi-core support for the database and Redis is 
not designed to benefit from multiple CPU cores. It is single-threaded in 
nature and utilizes only one core. Typically, you need to launch several Redis 
instances to scale out on several cores if needed.
Incidentally, JunoDB started as a single-threaded C++ program and the initial 
goal was to use it as an in-memory short TTL data store.
For reference, TTL stands for Time to Live. It specifies the maximum duration a 
piece of data should be retained or the maximum time it is considered valid.
However, the goals for JunoDB evolved with time.
First, PayPal wanted JunoDB to work as a persistent data store supporting long 
TTLs.
Second, JunoDB was also expected to provide improved data security via on-disk 
encryption and TLS in transit by default.
These goals meant that JunoDB had to be CPU-bound rather than memory-bound.
For reference, “memory-bound” and “CPU-bound” refer to different performance 
aspects in computer programs. As the name suggests, the performance of 
memory-bound programs is limited by the amount of available memory. On the 
other hand, CPU-bound programs depend on the processing power of the CPU.
For example, Redis is memory-bound. It primarily stores the data in RAM and 
everything about it is optimized for quick in-memory access. The limiting 
factor for the performance of Redis is memory rather than CPU.
However, requirements like encryption are CPU-intensive because many 
cryptographic algorithms require raw processing power to carry out complex 
mathematical calculations.
As a result, PayPal decided to rewrite the earlier version of JunoDB in Go to 
make it multi-core friendly and support high concurrency.
The Architecture of JunoDB
The below diagram shows the high-level architecture of JunoDB.
Let’s look at the main components of the overall design.
1 - JunoDB Client Library
The client library is part of the client application and provides an API for 
storing and retrieving data via the JunoDB proxy.
It is implemented in several programming languages such as Java, C++, Python, 
and Golang to make it easy to use across different application stacks.
For developers, it’s just a matter of picking the library for their respective 
programming language and including it in the application to carry out the 
various operations.
2 - JunoDB Proxy with Load Balancer
JunoDB utilizes a proxy-based design where the proxy connects to all JunoDB 
storage server instances.
This design has a few important advantages:
The complexity of determining which storage server should handle a query is 
kept out of the client libraries. Since JunoDB is a distributed data store, the 
data is spread across multiple servers. The proxy handles the job of directing 
the requests to the correct server.
The proxy is also aware of the JunoDB cluster configuration (such as shard 
mappings) stored in the ETCD key-value store.
But can the JunoDB proxy turn into a single point of failure?
To prevent this possibility, the proxy runs on multiple instances downstream to 
a load balancer. The load balancer receives incoming requests from the client 
applications and routes the requests to the appropriate proxy instance.
3 - JunoDB Storage Servers
The last major component in the JunoDB architecture is the storage servers.
These are instances that accept the operation requests from the proxy and store 
data in the memory or persistent storage.
Each storage server is responsible for a set of partitions or shards for an 
efficient distribution of data.
Internally, JunoDB uses RocksDB as the storage engine. Using an off-the-shelf 
storage engine like RocksDB is common in the database world to avoid building 
everything from the ground up. For reference, RocksDB is an embedded key-value 
storage engine that is optimized for high read and write throughput.
Key Priorities of JunoDB
Now that we have looked at the overall design and architecture of JunoDB, it’s 
time to understand a few key priorities for JunoDB and how it achieves them.
Scalability
Several years ago, PayPal transitioned to a horizontally scalable 
microservice-based architecture to support the rapid growth in active customers 
and payment rates.
While microservices solve many problems for them, they also have some drawbacks.
One important drawback is the increased number of persistent connections to 
key-value stores due to scaling out the application tier. JunoDB handles this 
scaling requirement in two primary ways.
1 - Scaling for Client Connections
As discussed earlier, JunoDB uses a proxy-based architecture.
If client connections to the database reach a limit, additional proxies can be 
added to support more connections. 
There is an acceptable trade-off with latency in this case.
2 - Scaling for Data Volume and Throughput
The second type of scaling requirement is related to the growth in data size.
To ensure efficient storage and data fetching, JunoDB supports partitioning 
based on the consistent hashing algorithm. Partitions (or shards) are 
distributed to physical storage nodes using a shard map.
Consistent hashing is very useful in this case because when the nodes in a 
cluster change due to additions or removals, only a minimal number of shards 
require reassignment to different storage nodes.
PayPal uses a fixed number of shards (1024 shards, to be precise), and the 
shard map is pre-generated and stored in ETCD storage. 
Any change to the shard mapping triggers an automatic data redistribution 
process, making it easy to scale your JunoDB cluster depending on the need.
The below diagram shows the process in more detail.
Availability
High availability is critical for PayPal. You can’t have a global payment 
platform going down without a big loss of reputation.
However, outages can and will occur due to various reasons such as software 
bugs, hardware failures, power outages, and even human error. Failures can lead 
to data loss, slow response times, or complete unavailability.
To mitigate these challenges, JunoDB relies on replication and failover 
strategies.
1 - Within-Cluster Replication
In a cluster, JunoDB storage nodes are logically organized into a grid. Each 
column represents a zone, and each row signifies a storage group.
Data is partitioned into shards and assigned to storage groups. Within a 
storage group, each shard is synchronously replicated across various zones 
based on the quorum protocol.
The quorum-based protocol is the key to reaching a consensus on a value within 
a distributed database. You’ve two quorums:
The Read Quorum: When a client wants to read data, it needs to receive 
responses from a certain number of zones (known as the read quorum). This is to 
make sure that it gets the most up-to-date data.
The Write Quorum: When the client wants to write data, it must receive 
acknowledgment from a certain number of zones to make sure that the data is 
written to a majority of the zones.
There are two important rules when it comes to quorum.
The sum of the read quorum and write quorum must be greater than the number of 
zones. If that’s not the case, the client may end up reading outdated data. For 
example, if there are 5 zones with read quorum as 2 and write quorum as 3, a 
client can write data to 3 zones but another client may read from the 2 zones 
that have not yet received the updated data.
The write quorum must be more than half the number of zones to prevent two 
concurrent write operations on the same key. For example, if there is a JunoDB 
cluster with 5 zones and a write quorum of 2, client A may write value X to key 
K and is considered successful when 2 zones acknowledge the request. Similarly, 
client B may write value Y to the same key K and is also successful when two 
different zones acknowledge the request. Ultimately, the data for key K is in 
an inconsistent state.
In production, PayPal has a configuration with 5 zones, a read quorum of 3, and 
a write quorum of 3.
Lastly, the failover process in JunoDB is automatic and instantaneous without 
any need for leader re-election or data redistribution. Proxies can know about 
a node failure through a lost connection or a read request that has timed out.
2 - Cross-data center replication
Cross-data center replication is implemented by asynchronously replicating data 
between the proxies of each cluster across different data centers.
This is important to make sure that the system continues to operate even if 
there’s a catastrophic failure at one data center.
Performance
One of the critical goals of JunoDB is to deliver high performance at scale.
This translates to maintaining single-digit millisecond response times while 
providing a great user experience.
The below graphs shared by PayPal show the benchmark results demonstrating 
JunoDB’s performance in the case of persistent connections and high throughput.
Security
Being a trusted payment processor, security is paramount for PayPal. 
Therefore, it’s no surprise that JunoDB has been designed to secure data both 
in transit and at rest. 
For transmission security, TLS is enabled between the client and proxy as well 
as proxies in different data centers used for replication.
Payload encryption is performed at the client or proxy level to prevent 
multiple encryptions of the same data. The ideal approach is to encrypt the 
data on the client side but if it’s not done, the proxy figures it out through 
a metadata flag and carries out the encryption.
All data received by the storage server and stored in the engine are also 
encrypted to maintain security at rest.
A key management module is used to manage certificates for TLS, sessions, and 
the distribution of encryption keys to facilitate key rotation,
The below diagram shows JunoDB’s security setup in more detail.
Use Cases of JunoDB
With PayPal having made JunoDB open-source, it’s possible that you can also use 
it within your projects.
There are various use cases where JunoDB can help. Let’s look at a few 
important ones:
1 - Caching
You can use JunoDB as a temporary cache to store data that doesn’t change 
frequently.
Since JunoDB supports both short and long-lived TTLs, you can store data from a 
few seconds to a few days. For example, a use case is to store short-lived 
tokens in JunoDB instead of fetching them from the database.
Other items you can cache in JunoDB are user preferences, account details, and 
API responses.
2 - Idempotency
You can also use JunoDB to implement idempotency.
An operation is idempotent when it produces the same result even when applied 
multiple times. With idempotency, repeating the operation is safe and you don’t 
need to be worried about things like duplicate payments getting applied.
PayPal uses JunoDB to ensure they don’t process a particular payment multiple 
times due to retries. JunoDB’s high availability makes it an ideal data store 
to keep track of processing details without overloading the main database.
3 - Counters
Let’s say you’ve certain resources that aren’t available for some reason or 
they have an access limit to their usage. For example, these resources can be 
database connections, API rate limits, or user authentication attempts.
You can use JunoDB to store counters for these resources and track whether 
their usage exceeds the threshold.
4 - Latency Bridging
As we discussed earlier, JunoDB provides fast inter-cluster replication. This 
can help you deal with slow replication in a more traditional setup.
For example, in PayPal’s case, they run Oracle in Active-Active mode, but the 
replication usually isn’t as fast as they would like for their requirement.
It means there are chances of inconsistent reads if records written in one data 
center are not replicated in the second data center and the first data center 
goes down.
JunoDB can help bridge the latency where you can write to Data Center A (both 
Oracle and JunoDB) and even if it goes down, you can read the updates 
consistently from the JunoDB instance in Data Center B.
See the below diagram for a better understanding of this concept.
Conclusion
JunoDB is a distributed key-value store playing a crucial role in various 
PayPal applications. It provides efficient data storage for fast access to 
reduce the load on costly database solutions.
While doing so, it also fulfills critical requirements such as scalability, 
high availability with performance, consistency, and security.
Due to its advantages, PayPal has started using JunoDB in multiple use cases 
and patterns. For us, it provides a great opportunity to learn about an 
exciting new database system.
References:
Unlocking the Power of JunoDB: PayPal’s Key-Value Store Goes Open-Source [ 
https://substack.com/redirect/6729a7ef-3797-4602-8ce1-92feeda906b7?j=eyJ1IjoiMXBxZnM3In0.Bn7pYfJL44qVpLsyOfEHZjPX-L54Pv8N6hc-GQqMSDU
 ]
JunoDB: PayPal Open Sources Key-Value Store [ 
https://substack.com/redirect/2ca64a99-64d6-41f6-8153-72297c205454?j=eyJ1IjoiMXBxZnM3In0.Bn7pYfJL44qVpLsyOfEHZjPX-L54Pv8N6hc-GQqMSDU
 ]
Redis Benchmark: Pitfalls and misconceptions [ 
https://substack.com/redirect/2a036e60-3c22-459b-a159-d5aa6d668864?j=eyJ1IjoiMXBxZnM3In0.Bn7pYfJL44qVpLsyOfEHZjPX-L54Pv8N6hc-GQqMSDU
 ]
Memory Bound Function [ 
https://substack.com/redirect/3d722ca1-ed0e-4265-8cc0-878524ba082c?j=eyJ1IjoiMXBxZnM3In0.Bn7pYfJL44qVpLsyOfEHZjPX-L54Pv8N6hc-GQqMSDU
 ]
High Availability [ 
https://substack.com/redirect/bf563879-2d71-453a-9131-69b81cefdbc4?j=eyJ1IjoiMXBxZnM3In0.Bn7pYfJL44qVpLsyOfEHZjPX-L54Pv8N6hc-GQqMSDU
 ]

Unsubscribe 
https://substack.com/redirect/2/eyJlIjoiaHR0cHM6Ly9ibG9nLmJ5dGVieXRlZ28uY29tL2FjdGlvbi9kaXNhYmxlX2VtYWlsP3Rva2VuPWV5SjFjMlZ5WDJsa0lqb3hNRE0yT1RBd09EY3NJbkJ2YzNSZmFXUWlPakUwTXpZeU1qTTVOQ3dpYVdGMElqb3hOekV6TWpneE5UYzNMQ0psZUhBaU9qRTNNVFU0TnpNMU56Y3NJbWx6Y3lJNkluQjFZaTA0TVRjeE16SWlMQ0p6ZFdJaU9pSmthWE5oWW14bFgyVnRZV2xzSW4wLnhMYlNPNG1WWDZCS21HMzl6NU9CSWxSN0JjNU1MLW5aVm5TMDJpUXMzd0kmZXhwaXJlcz0zNjVkIiwicCI6MTQzNjIyMzk0LCJzIjo4MTcxMzIsImYiOnRydWUsInUiOjEwMzY5MDA4NywiaWF0IjoxNzEzMjgxNTc3LCJleHAiOjE3MTU4NzM1NzcsImlzcyI6InB1Yi0wIiwic3ViIjoibGluay1yZWRpcmVjdCJ9.vzSHE_iwTtP9npFmnCSxkNWN8kYwgaEJHM4bkcXEfWM?

Reply via email to