Re: Crazy idea of cleanup the inode_record btrfsck things with SQL?

Qu Wenruo Mon, 01 Dec 2014 17:18:07 -0800


-------- Original Message --------
Subject: Re: Crazy idea of cleanup the inode_record btrfsck things with SQL?
From: Robert White <rwh...@pobox.com>

To: Qu Wenruo <quwen...@cn.fujitsu.com>, linux-btrfs<linux-btrfs@vger.kernel.org>

Date: 2014年12月02日 02:10

On 11/30/2014 10:18 PM, Qu Wenruo wrote:
(advocacy for using SQL internally for btrfsck)
All of these ideas you want to toss a entire SQL front end on are moresimply handled with simple data structures.
In C++ terms "map<inode,parent>" and/or "map<parent,vector<children>>"beats the heck out of including all of SQL and its related indexes andtype conversions (sqlite, for example, stores integers as doubles, ordecimal numbers depending on version).
RDBMS _are_ good at representing things, so noticing that a thing_can_ be represented with an RDBMS is very common.
But by the time you put two or three indexes on relation->(parent,child, name) you've given yourself three or four copies of the coredata in three or four different places. And those copies are largelyimmutable and randomly distributed and will include the overhead inmemory for fairly sparse trees.
It's not that it's an unworkable idea.
But it is unnecessarily generic and adds an order of magnitude ofcomplexity to your problems.
For instance, if I boot from a CD to run a btrfsck where will thedatabase files be written to?

This is easy, memory.
Since only when we judge the fs' metadata is too huge then we will use file.

One of the problem in current inode_record is, btrfsck can only recordthem all in memory,when metadata of the file system is too big, sysadmin can only add swapspace or memory

to handle it.

Although it is not a urgent problem, since 1T btrfs fs with about 5Gmetadata will only takes about 500M

checking chunk and extent and even less for checking fs roots.

If it is an in-memory table why do I want the overhead of SQL to lookup something indexed by integer?
If the sparse vectors of integers don't fit in memory why would theSQL tables of integers fit "better"?
SQL would be the second slowest possible for representing this data --The slowest would be an XML schema stored as flat text.
So your crazy ides is also a pretty bad one compared to most if notall sparse data representations and techniques that come to bear onthis problem set. All you are really doing is pushing the same work(walking a tree to find an integer) into a difficult "spell it out inSQL" space.
Is prepare_sql(curosr,"SELECT parent FROM parantage_tree WHERE child =%d"); execute_sql(cursor,child); and its possible error returnsactually clearer or better than "parent=inheretance.find(child); if(parent!=inheretance.end()) {...}" (as it might be written in C++)?
Do you want to know if (keep track of whether) an inode is allocatedand referenced? There's a sparse bit-vector for that...
Want to be able to get back to an inode's location on disk, a sparsearray of disk offsets exists (among other options).
Before you can even access the RDBMS you'd have to fill it completely;otherwise you wouldn't know if a select returning zero rows was anauthoritative indication that the datum didn't exist or if it wasinstead an indication that the datum hadn't been populated yet.
THIS IS NOT SARCASM: If you strongly disagree, I suggest you startcoding. Seriously, don't ask, do... And in a month really check to seeif your solution is any smaller, faster, easier, or in _any_ _way_more optimal than using native data structures. The attempt willanswer the question definitively and then we'll all know...

I know this is a crazy idea and not disagree with your opinion.

But I am also somewhat tired of bringing new structure new searchingfunctions or even bring larger change onthe btrfsck record infrastructure when I found that can't provide thefunction when new recovery function is going

to be implemented.

In fact, after I implement the whole corrupted-leaf recovery patchset, Imay try to implement it as an experimentaltry-and-error for cleanup/enhance for the inode_record infrastructureand see if there is the huge performance dropor the lines of code reduced(anyway, just a personal try-and-error, willnot send them if there is no such interesting

result, and it may be highly possible a disaster as you mentioned)

Thanks,
Qu

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Crazy idea of cleanup the inode_record btrfsck things with SQL?

Reply via email to