Re: [bitcoin-dev] [patch] Switching Bitcoin Core to sqlite db

2015-10-29 Thread telemaco via bitcoin-dev

Why not allow two options:

1/ a default RocksDB/SQLite/LevelDB (whatever is decided)
2/ alternative provide instructions for connection to any other rdbms 
using odbc or jdbc.


Why not allowing async disk writes or incredibly fast database systems 
if someone wants to have a node in a very fast datacenter or connected 
with their existing leveraged dataservers. It is the traditional 
approach to just use the open standard for database connectivity.


Any person or any organization would just need to have one machine with 
their bitcoin node with a rdbms client installed (SAP Sybase client, or 
oracle client, or microsoft). The bitcoin node would just store their 
data using the odbc/jdbc protocol on ANY rdbms installed anywhere in 
their organization (other machine or the same). They would just need to 
issue a "create table" with a very simple table structure and they would 
benefit from async and indexes and using their already licensed, and 
configured system of their choosing, with bitcoin information being 
available to thousands of software packages and available aswell to 
thousands of programmers that work with rdbms and not just "RocksDB" or 
some obscure database system.


Why not "outsource" totally that data management part to the already 
existing with decades of experience database world. People would be able 
to create incredibly easy bitcoin statistics/graphs/analisys with 
existing software packages (hey even excel or libreoffice like) or 
connect bitcoin data to their own sources and if so they chose analyze 
bitcoin data on a datawarehouse or any imaginable approach. Of course 
every transaction would be have to do through the bitcoin node and only 
the data management would be on rdbms side.



___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] [patch] Switching Bitcoin Core to sqlite db

2015-10-29 Thread Gregory Maxwell via bitcoin-dev
On Thu, Oct 29, 2015 at 6:57 AM, telemaco via bitcoin-dev
 wrote:
> Why not "outsource" totally that data management part to the already
> existing with decades of experience database world. People would be able to
> create incredibly easy bitcoin statistics/graphs/analisys with existing
> software packages (hey even excel or libreoffice like) or connect bitcoin
> data to their own sources and if so they chose analyze bitcoin data on a
> datawarehouse or any imaginable approach. Of course every transaction would
> be have to do through the bitcoin node and only the data management would be
> on rdbms side.

The word "database" is likely confusing people here.  This is not a
database in an ordinary sense.

The bitcoin core consensus engine requires a highly optimized ultra
compact data structure to perform the lookups for coin existence. The
data stored is highly compressed and very specialized, it would not be
useful to other applications.  Right now, on boring laptop hardware,
during network synchronization updates to this database run at over
10,000 records per second, while the system is also busy doing the
other validation chores of a node. This is backended by a high
performance transactional key value store.  The need for performance
here is essential to even keeping up with the network, it's not about
enabling any kind of fancy querying (bitcoin core does not offer fancy
querying), it's about the base load that every node must handle to
usably sync up and keep up with the Bitcoin network.

The backend can be swapped out for something else that provides the
same properties, but doing so does not give you any of the
inspection/analytics that you're looking for.  Systems that do that
exist, and they require databases taking hundreds of gigabytes of
storage and take days to weeks to import the network data.  They're
great for what they're for, but they're not suitable for consensus use
in the system for space efficiency, performance, and consensus
consistency reasons.
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] [patch] Switching Bitcoin Core to sqlite db

2015-10-29 Thread Luke Dashjr via bitcoin-dev
On Thursday, October 29, 2015 6:57:39 AM telemaco via bitcoin-dev wrote:
> Why not allow two options:
> 
> 1/ a default RocksDB/SQLite/LevelDB (whatever is decided)
> 2/ alternative provide instructions for connection to any other rdbms
> using odbc or jdbc.

I predict this would be a disaster. UTXO storage is CONSENSUS-CRITICAL code.
Any divergence in implementation behaviour, including bugs AND bugfixes, may 
cause consensus failure. For this to have a reasonable *hope* of working, we 
need to choose one storage engine, and *will* need to maintain consensus-
compatibility of it ourselves (since nobody else cares).

Fixing LevelDB frankly seems like an easier task than switching to anything 
SQL-based, which would require a *lot* more *difficult-to-get-consensus-
compatible* code that we are all (or at least mostly) very unfamiliar with.

Research is fine, but let's be realistic about deployment.

Luke
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] [patch] Switching Bitcoin Core to sqlite db

2015-10-29 Thread Gregory Maxwell via bitcoin-dev
On Fri, Oct 30, 2015 at 3:04 AM, Simon Liu via bitcoin-dev
 wrote:
> Given that UTXO storage is considered critical, it seems reasonable to

This sounds like a misunderstanding of what consensus criticial means.
It does not mean that it must be right (though obviously that is
preferable) but that it must be _consistent_, between all nodes.

> full node and keep up with the network, why not let those users with the
> resources to operate big iron databases do so?  It would be a good
> feature to have.

Because it provides no value, the data is opaque and propritarily
encoded with a compression function which we may change from version
to version, and because many of these alternatives are enormously
slow; enough that they present problems with falling behind the
network even on high performance hardware.

Moreover, additional functional which will not be sufficiently used
will not adequately maintained and result in increased maintains costs
and more bugs.
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] [patch] Switching Bitcoin Core to sqlite db

2015-10-29 Thread Gregory Maxwell via bitcoin-dev
On Fri, Oct 30, 2015 at 4:04 AM, Peter R  wrote:
> Can you give a specific example of how nodes that used different database 
> technologies might determine different answers to whether a given transaction 
> is valid or invalid?  I’m not a database expert, but to me it would seem that 
> if all the unspent outputs can be found in the database, and if the relevant 
> information about each output can be retrieved without corruption, then 
> that’s all that really matters as far as the database is concerned.

If you add to those set of assumptions the handling of write ordering
is the same (e.g. multiple updates in an change end up with the same
entry surviving) and read/write interleave returning the same results
then it wouldn't.

But databases sometimes have errors which cause them to fail to return
records, or to return stale data. And if those exist consistency must
be maintained; and "fixing" the bug can cause a divergence in
consensus state that could open users up to theft.

Case in point, prior to leveldb's use in Bitcoin Core it had a bug
that, under rare conditions, could cause it to consistently return not
found on records that were really there (I'm running from memory so I
don't recall the specific cause).  Leveldb fixed this serious bug in a
minor update.  But deploying a fix like this in an uncontrolled manner
in the bitcoin network would potentially cause a fork in the consensus
state; so any such fix would need to be rolled out in an orderly
manner.

> I’d like a concrete example to help me understand why more than one 
> implementation of something like the UTXO database would be unreasonable.

It's not unreasonable, but great care is required around the specifics.

Bitcoin consensus implements a mathematical function that defines the
operation of the system and above all else all systems must agree (or
else the state can diverge and permit double-spends);  if you could
prove that a component behaves identically under all inputs to another
function then it can be replaced without concern but this is something
that cannot be done generally for all software, and proving
equivalence even in special cases it is an open area of research.  The
case where the software itself is identical or nearly so is much
easier to gain confidence in the equivalence of a change through
testing and review.

With that cost in mind one must then consider the other side of the
equation-- utxo database is an opaque compressed representation,
several of the posts here have been about desirability of blockchain
analysis interfaces, and I agree they're sometimes desirable but
access to the consensus utxo database is not helpful for that.
Similarly, other things suggested are so phenomenally slow that it's
unlikely that a node would catch up and stay synced even on powerful
hardware.  Regardless, in Bitcoin core the storage engine for this is
fully internally abstracted and so it is relatively straight forward
for someone to drop something else in to experiment with; whatever the
motivation.

I think people are falling into a trap of thinking "It's a ,
I know a  for that!"; but the application and needs are
very specialized here; no less than, say-- the table of pre-computed
EC points used for signing in the ECDSA application. It just so
happens that on the back of the very bitcoin specific cryptographic
consensus algorithim there was a slot where a pre-existing high
performance key-value store fit; and so we're using one and saving
ourselves some effort.  If, in the future, Bitcoin Core adopts a
merkelized commitment for the UTXO it would probably need to stop
using any off-the-shelf key value store entirely, in order to avoid a
20+ fold write inflation from updating hash tree paths (And Bram Cohen
has been working on just such a thing, in fact).
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] [patch] Switching Bitcoin Core to sqlite db

2015-10-29 Thread Simon Liu via bitcoin-dev
Storage of UTXO data looks like an implementation detail and thus one
would have thought that the choice of database would not increase the
odds of consensus protocol failure.

Btcd, a full node implementation written in Go, already provides a
database interface which supports different backends:

https://github.com/btcsuite/btcd/tree/master/database

Given that UTXO storage is considered critical, it seems reasonable to
let a node operator decide for themselves if they want data stored in
LevelDB (which is not fully ACID compliant) or a database like Sqlite,
Oracle, DB2 etc.

If the storage requirements for UTXO data are fairly simple, consisting
mainly of puts and gets, there is a decent argument that using a
dedicated key-value store provides superior performance over a
traditional SQL database.

However, from a practical perspective, given that nodes operate on a
range of different hardware and even a little Raspberry Pi can run a
full node and keep up with the network, why not let those users with the
resources to operate big iron databases do so?  It would be a good
feature to have.


On 10/29/2015 01:03 AM, Luke Dashjr via bitcoin-dev wrote:
> I predict this would be a disaster. UTXO storage is CONSENSUS-CRITICAL code.
> Any divergence in implementation behaviour, including bugs AND bugfixes, may 
> cause consensus failure. For this to have a reasonable *hope* of working, we 
> need to choose one storage engine, and *will* need to maintain consensus-
> compatibility of it ourselves (since nobody else cares).
> 
> Fixing LevelDB frankly seems like an easier task than switching to anything 
> SQL-based, which would require a *lot* more *difficult-to-get-consensus-
> compatible* code that we are all (or at least mostly) very unfamiliar with.
> 
> Research is fine, but let's be realistic about deployment.
> 
> Luke
> ___
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
> 
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev