Ross Younger wrote:
I've include the current version of my proposed design, although following
the conversation with Simon this week it has occurred to me that it might
well be reasonable to fillet out the core idea and make it available as a
simple proto-filesystem. (Essentially it would be a block device with very
large blocks. The high-level operations provided would be reading and
writing up to a blockful, and erasing a whole block; this doesn't seem to
fit well with the eCos block device model, so I suspect it would be better
off as its own interface.) Please do speak up if that would be a useful
development, and if so I'll happily rework this proposal into two parts as
time allows.
I think there would be merit in doing something like that. I agree a block
device interface wouldn't be appropriate.
============================================================================
NOR-in-NAND design (3/7/09)
1. Scenario and assumptions:
We want RedBoot to be able to use NAND flash as if it was NOR flash,
for FIS and fconfig.
We don't care about supporting non-RB apps, so can rely on RB's behaviour.
Not sure that is wise. And I'm not sure what specific behaviour you would
wish to rely on. I wouldn't make the code itself dependent on RedBoot in
any case. That's what got us into a mess with the v1 flash drivers.
Ultimately the RedBoot FIS approach is not something we'd like to continue
with - it was designed for the common case of the late '90s/early '00s of
single NOR flash parts (and in some aspects, isn't so great even for then,
such as not being able to support bootblocks properly). That code has
limited lifespan, and many flaws. It's better to do something that can
last beyond that.
I can certainly see this layer being used for a non-RedBoot boot loader.
Lots of people may use RedBoot during development, but this sort of layer
is useful for any boot loader. Far fewer people want to use RedBoot for
their final product (outside of development) - they just want to load
their apps and go. I'm not suggesting writing a boot loader to do this
now, but I do believe it would be a mistake to think that people won't
want to.
Analysis:
* fconfig-in-FIS always calls FIS code.
* fconfig not in FIS always erases before programming.
* FIS always erases before it programs, only deals with block-aligned
regions, and never rewrites within such a region. [+]
Design goals:
* Clean and simple, in keeping with the eCos philosophy.
* Robust: copes with blocks going bad and indulges in some sort of
wear-levelling.
* No use of malloc. (As we're targetting only RB, we can use the workspace.)
The workspace is a very crude way to allocate memory, but if you can
constrain yourself to it, so much the better.
The number of blocks available - i.e. (1 + maximum valid logical block
number) - will be computed as:
* Number of physical blocks in the chip or partition,
* minus the number of factory bad blocks,
* minus one (to allow for robust block rewrites),
* minus an allowance for bad blocks to develop during the life
of the device.
Of course a (NOR) flash driver has to specify a constant device size,
whereas the bad blocks are board specific, so really you have to have an
allowance for bad blocks full stop.
The NOR-in-NAND layer will be making multiple-page reads and writes;
it may be worthwhile to put code to do this into the NAND layer, as
opposed to forcing all users to reinvent the wheel.
Sounds like a better approach.
NAND blocks are used as a dumb datastore, with their physical addresses
bearing no relation to their logical addresses. In-use blocks are tagged
in the OOB area with the logical block number they refer to (see below).
Physical storage blocks are used sequentially from the beginning of
the device, then cycling back to the start to reuse erased blocks. This
is managed by maintaining a next-write "pointer". At runtime, this is
the next vacant block after the last written block; at boot time, it
is initialised by scanning the filesystem to find the block with the
highest serial number, which is taken to be the latest-written block.
That sounds potentially painful for non-trivial numbers of blocks, unless
you are going to use a large logical block size. More below.
This scheme is much better than a simple block mapping due to its
robustness: it intrinsically provides reasonable wear-levelling,
Not much if most/all blocks are used. I suspect with the normal usage
pattern of flash under RedBoot, you wouldn't get much wear-levelling.
You'd need to start occasionally reallocating used blocks too to
wear-level properly, which admittedly probably wouldn't be /that/
difficult. That said, I think we should indeed probably just put up with
theoretically inadequate wear-levelling for now.
Conversely, note that MLC NAND may start to become more common, but it is
(I think) rated for ~10K cycles which may increase the need for more
thorough wear-levelling.
2.1.1 OOB tag format
The tag is written into the OOB area of the first page of each block
(as "application" OOB data, from the NAND library's point of view).
The tag is a packed byte array, in processor-local endian, with the
following contents:
* Magic number - 2 bytes, 0xEF15. (This is a compile-time constant
and demonstrates that the block is one of ours. It's a corruption
of "eCos FIS".)
* Logical block number - 2 bytes.
* Master serial number - 4 bytes (see below).
Given you say processor-local endian, I assume you mean there are two
16-bit words and a 32-bit word here.
Given you are assuming partial-page writes, I think you can do something
more intelligent here to handle the seeking through NAND space that your
proposal entails for every read/write:
- For a start, the serial number seems potentially overkill unless I'm
missing something. All you need to know is whether a discovered logical
block number is the most recent version of it. The serial only needs to
reflect that block. When you write a new revision of a block, you mark the
previous one dead by overwriting it with a partial write (without
erasure). Thus you only have one valid version of a block at one time.
Duplicates are dealt with solely at initial device scan time (stomping on
the old one at that point). This way you only need 2 bits to represent the
serial, theoretically (as the difference between serials can only be 1, so
you can always tell which is older).
- That frees up space which we can use for potential optimisations. In
particular, the common use-case we are envisaging is wholly sequential
reads of fairly large images. So we could use 2 bytes to point to the next
block in the logical block chain. This is very useful if most use is
sequential. If that block turns out not to be the correct block number, we
lose very little and just revert to scanning the medium. Most times it
should be correct. (We could make this behaviour a CDL option anyway).
This does mean knowing what the next block to be used next will be at the
time you are writing the current block, but that doesn't seem to be much
of an issue - it's primarily just bringing forward a determination you'd
have to make anyway. This all doesn't seem a particularly
hard-to-implement optimisation.
I should note though that multiple writes are not supported on newer MLC
NAND flash. This could be an issue as this class of NAND may become more
common. Perhaps in that case an obsoleted block can just be erased
immediately.
Also, if scanning the medium for each block isn't as slow as I fear it may
be, then the above may be unnecessary (although there's still a good
argument for freeing up the OOB bits for later use, if they can be freed).
To be safe, we should impose an upper
limit on the number of physical NAND blocks that this system will use,
and hence cap the number of logical NOR blocks the system will support. I
suggest 1024, which ought to be enough for anybody; it's more than many
(most?) NOR chips.)
The number of blocks should be configurable anyway, so I don't think we
need go beyond that surely? Setting a default of 1024 for such an option
should be adequate.
2.4 Runtime considerations
If a runtime performance boost was required, the system could on startup
scan the NAND partition and build up a cached mapping in-RAM of the
physical addresses of each (non-zeroed) logical block. However, we don't
expect this code will be used on gigantic NAND arrays [partitions],
Hmm. I'm not as certain about that. People like having lots of space to
play in, with e.g. multiple app versions or linux kernel images or root
fs's or initrd's to load etc. A linear scan will work ok on a brand new
board for a while - starting the scan for the next block from the current
block of course - but in due course, performance would deteriorate, mostly
irreversibly.
and it should only see light duty via RedBoot.
For a production system sure, but less so on a developer's board, with
apps frequently getting written/rewritten. In particular the FIS directory
updates will get interspersed frequently as a result which will cause
increasing fragmentation. Put it like this - if you're considering
something where wear-levelling is a concern (and for the mooted proto-fs,
that's certainly valid), then you'd definitely need to consider the length
of time scanning every block read as the block mappings will drift further
away from 1:1 logical to virtual, and stop being linear.
Therefore it won't take
long to linear-scan for blocks during operations, so this optimisation
may not be worth its complexity.
And 4Kbytes RAM (for, say, 1024 blocks).
(Bart suggested that a scan at startup could also take care of the cases
above where more than one physical block is tagged with the same logical
block number, hence reducing the time and complexity of the block access
code. There's a startup time vs access time vs memory trade-off here,
though we haven't fully analysed it. Nick agreed that we ought to think
this through very carefully.)
I think that being forced as a matter of course to scan the whole medium
_every_ block read is a bad thing and should be avoided.
Something you may want to think about is that there are problems that
RedBoot's FIS and config code have with multiple flash devices in a
system. This may happen, whether with multiple NANDs or a mixture of flash
types - possibly increasingly so these days. Work by myself and IIRC to
some extent Bart at eCosCentric has ameliorated problems somewhat, but not
fixed them. RedBoot's FIS code orients itself around a single flash
device, with a single fixed block size for the entirety of that device.
You may stumble across problems here as NAND may not be the only flash
device on many boards, so it's something to bear in mind.
Jifl
--
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine