Re: Grepping though a disk

2013-03-04 Thread Warren Block

On Mon, 4 Mar 2013, Polytropon wrote:


The file size of the file I'm searching for is less than 10 kB.
It's a relatively small text file which got some subsequent
additions in the last days, but hasn't been part of the backup
job yet.


There have been some good suggestions.  I would use a large buffer with 
dd, say 1M or more, both for speed and to reduce the chance of hitting 
only part of the search string.


For the future, look at sysutils/rsnapshot.  Easy to set up, 
space-efficient, and provides an easily-accessed file history.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Grepping though a disk

2013-03-04 Thread C. P. Ghost
On Mon, Mar 4, 2013 at 1:36 AM, Polytropon  wrote:
> Any suggestion is welcome!

How about crawling the metadata, locating each block
that is already allocated, and skip those blocks when you
scan the disk? That could reduce the searching space
significantly. blkls(1) et al. from the Sleuth Kit are your friends.

Good luck,
-cpghost.

-- 
Cordula's Web. http://www.cordula.ws/
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Grepping though a disk

2013-03-04 Thread Julian H. Stacey
Hi Polytropon & cc questions@

> Any suggestion is welcome!

Ideas:
A themed list: freebsd...@freebsd.org

There's a bunch of fs tools in /usr/ports/sysutils/

My http://www.berklix.com/~jhs/src/bsd/jhs/bin/public/slice/
 slices large images such as tapes & disks
 (also the slice names would give numbers convertable to offsets
   probaably useful to eg ..a)
man fsdb

A bit of custom C should run a lot faster than shells & greps, eg 
when I was looking for nasty files from a bad scsi controller, I wrote
http://www.berklix.com/~jhs/src/bsd/jhs/bin/public/8f/

One could run eg slice asynchronously & suspend ^Z when you run out of 
space, & periodicaly run some custom C (like 8f.c) or some find grep -v rm  
loop to discard most slices as of no interest. Then resume slicing.

OK, thats doing writes too, so slower than just read & a later dd with 
seek=whatever, depends how conservative one's feeling, about doing reruns
with other search criteria.

You mentioned risk of text string chopped across a slice/block boundary.
Certainly a risk. Presumably solution is to search twice.
2nd time after a dd with a half block/ slice size offset, 
then slice/search again.

If you runout of space to do that, you might write a temporary
disklabel/bsdlabel with an extra partition with a half block offset
.. dodgy stuff that, do it while you'r wide awake :-)

Always a pain these scenarios, loosing hours of human & CPU time, I hope
data's worth it, good luck. 

Cheers,
Julian
-- 
Julian Stacey, BSD Unix Linux C Sys Eng Consultant, Munich http://berklix.com
 Reply below not above, like a play script.  Indent old text with "> ".
 Send plain text.  No quoted-printable, HTML, base64, multipart/alternative.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Grepping though a disk

2013-03-04 Thread Polytropon
On Mon, 4 Mar 2013 11:29:00 +, Steve O'Hara-Smith wrote:
> On Mon, 4 Mar 2013 12:15:24 +0100
> Polytropon  wrote:
> 
> > But I don't know how to do this. From reading "man dd"
> > my impression (consistent with my experience) is that
> > the option skip= operates in units of bs= size, so I'm
> > not sure how to compose a command that reads units of
> > 1 MB, but skips in units of 950 kB. Maybe some parts of
> > my memory have also been marked "unused" by fsck. :-)
> 
>   Not too hard (you'll kick yourself when you read down) - translation
> to valid shell script is left as an exercise for the reader :)
> 
>  bs=50k count=(n*20) skip=(n*20 - 1)
> 
>   Probably nicer to use powers of 2
> 
> bs=64k count=(n*16) skip=(n*16 - 1)

Thanks for the pointer. I was so concentrated on finding
the answer within dd that I hadn't thought about that.
It's easy to write this in shell code. As a conclusion,
I will apply for further IQ reduction, seems that I have
enough spare brain power I don't use anyway. :-)



-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Grepping though a disk

2013-03-04 Thread Steve O'Hara-Smith
On Mon, 4 Mar 2013 12:15:24 +0100
Polytropon  wrote:

> But I don't know how to do this. From reading "man dd"
> my impression (consistent with my experience) is that
> the option skip= operates in units of bs= size, so I'm
> not sure how to compose a command that reads units of
> 1 MB, but skips in units of 950 kB. Maybe some parts of
> my memory have also been marked "unused" by fsck. :-)

Not too hard (you'll kick yourself when you read down) - translation
to valid shell script is left as an exercise for the reader :)

 bs=50k count=(n*20) skip=(n*20 - 1)

Probably nicer to use powers of 2

bs=64k count=(n*16) skip=(n*16 - 1)

-- 
Steve O'Hara-Smith 
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Grepping though a disk

2013-03-04 Thread Polytropon
On Mon, 04 Mar 2013 04:15:48 -0600, Joshua Isom wrote:
> I'd call bs= essential for speed.  Any copying will be faster with 
> something higher.

I thought about that. Narrowing down _if_ something has
found is easy, e. g. when the positive 1 MB unit is dd'ed
to a file, further work can easily be applied.



> Also, there's the possibility, very annoying, that 
> your search string overlaps a place where you read.  I'd probably check 
> 1M blocks, but advance maybe 950k each time.

I also thought about that, that's why the distinctive
phrase I'm searching for is less than 10 characters long.
Still it's possible that it appears "across" a boundary
of units, no matter how big or small I select bs=.

But I don't know how to do this. From reading "man dd"
my impression (consistent with my experience) is that
the option skip= operates in units of bs= size, so I'm
not sure how to compose a command that reads units of
1 MB, but skips in units of 950 kB. Maybe some parts of
my memory have also been marked "unused" by fsck. :-)



> Make sure you're reading 
> from block offsets for maximum speed.

How do I do that? The disk is a normal HDD which has been
initialized with "newfs -U" and no further options.

ad6: 953869MB 
at ata3-master UDMA100 SATA 1.5Gb/s

The file system spans the whole disk.



> I know disk editors exist, I 
> remember using one on Mac OS 8.6 for find a lost file.  That was back on 
> a 6 gig hard drive.

Ha, I've done stuff like that on DOS with "important business
data" many years ago, using the "Norton Disk Doctor" (NDD.EXE)
when Norton (today: Symantec) wasn't yet synonymous for "The
Yellow Plague". This program actually was quite cool, and you
could search for things, manipulate disks on several levels
(files, file system and below). I had even rewritten an entire
partition table from scratch, memory and handheld calculator
after an OS/2 installation went crazy. :-)



> Depending on the file size, you could open the disk in vi and just 
> search from there, or just run strings on the disk and pipe it to vi.

You mean like "strings /dev/ad6 | something", without dd? That
would give me _no_ progress indicator (with my initial approach
I have increasing numbers at least), but I doubt I can load a
1 TB partition in a vi session with only 2 GB RAM in the machine.

If I try "strings /dev/ad6" I get a warning: "strings: Warning:
'/dev/ad6' is not an ordinary file". True. But this opens a
useful use of cat: "cat /dev/ad6 | strings". (Interesting idea,
I will investigate this further.)

The file size of the file I'm searching for is less than 10 kB.
It's a relatively small text file which got some subsequent
additions in the last days, but hasn't been part of the backup
job yet. I can only remember parts of those additions, because
as I said my brain is not good with computer. :-)

Or do you think of something different? If yes, please explain.
The urge to learn is strong when something went wrong. :-)


-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Grepping though a disk

2013-03-04 Thread Joshua Isom

On 3/3/2013 6:36 PM, Polytropon wrote:

Due to a fsck file system repair I lost the content of a file
I consider important, but it hasn't been backed up yet. The
file name is still present, but no blocks are associated
(file size is zero). I hope the data blocks (which are now
probably marked "unused") are still intact, so I thought
I'd search for them because I can remember specific text
that should have been in that file.

As I don't need any fancy stuff like a progress bar, I
decided to write a simple command, and I quickly got
something up and running which I _assume_ will do what
I need.

This is the command I've been running interactively in bash:

$ N=0; while true; do echo "${N}"; dd if=/dev/ad6 of=/dev/stdout bs=10240 count=1 
skip=${N} 2>/dev/null | grep ""; if [ $? -eq 0 ]; then break; fi; N=`expr ${N} + 
1`; done

To make it look a bit better and illustrate the simple
logic behind my idea:

N=0
while true; do
echo "${N}"
dd if=/dev/ad6 of=/dev/stdout bs=10240 count=1 skip=${N} \
2>/dev/null | grep ""
if [ $? -eq 0 ]; then
break
fi
N=`expr ${N} + 1`
done

Here  refers to the text. It's only a small, but
very distinctive portion. I'm searching in blocks of 10 kB
so it's easier to continue in case something has been found.
I plan to output the resulting "block" (it's not a real disk
block, I know, it's simply a unit of 10 kB disk space) and
maybe the previous and next one (in case the file, the _real_
block containing the data, has been split across more than
one of those units. I will then clean the "garbage" (maybe
from other files) because I can easily determine the beginning
and the end of the file.

Needless to say, it's a _text_ file.

I understand that grep operates on text files, but it will
also happily return 0 if the text to search for will appear
in a binary file, and possibly return the whole file as a
search result (in case there are no newlines in it).

My questions:

1. Is this the proper way of stupidly searching a disk?

2. Is the block size (bs= parameter to dd) good, or should
I use a different value for better performance?

3. Is there a program known that already implements the
functionality I need in terms of data recovery?

Results so far:

The disk in question is a 1 TB SATA disk. The command has
been running for more than 12 hours now and returned one
false-positive result, so basically it seems to work, but
maybe I can do better? I can always continue search by
adding 1 to ${N}, set it as start value, and re-run the
command.

Any suggestion is welcome!





I'd call bs= essential for speed.  Any copying will be faster with 
something higher.  Also, there's the possibility, very annoying, that 
your search string overlaps a place where you read.  I'd probably check 
1M blocks, but advance maybe 950k each time.  Make sure you're reading 
from block offsets for maximum speed.  I know disk editors exist, I 
remember using one on Mac OS 8.6 for find a lost file.  That was back on 
a 6 gig hard drive.


Depending on the file size, you could open the disk in vi and just 
search from there, or just run strings on the disk and pipe it to vi.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Grepping though a disk

2013-03-04 Thread Polytropon
On Mon, 4 Mar 2013 10:09:50 +0100, Damien Fleuriot wrote:
> Hey that's actually a pretty creative way of doing things ;)

It could be more optimum. :-)

My thought is that I could maybe use a better bs= to make
the whole thing run faster. I understand that for every
unit, a subprocess dd | grep is started and an if [] test
is run. Maybe doing this with 1 MB per unit would be better?
Note that I need to grep through 1 TB in 10 kB steps...



> Just to make sure, you've stopped daemons and all the stuff
> that could potentially write to the drive and nuke your
> blocks right ?

Of course. The /dev/ad6 disk is a separate data disk which
is not in use at the moment (unmounted). Still it is possible
that the block has already been overwritten, but when the
search has finished, it's almost certain in what state the
data is.

I would rewrite the file, but my eidetic memory is not working
well anymore, so I can only remember parts of it... :-(




-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Grepping though a disk

2013-03-04 Thread Damien Fleuriot

On 4 Mar 2013, at 01:36, Polytropon  wrote:

> Due to a fsck file system repair I lost the content of a file
> I consider important, but it hasn't been backed up yet. The
> file name is still present, but no blocks are associated
> (file size is zero). I hope the data blocks (which are now
> probably marked "unused") are still intact, so I thought
> I'd search for them because I can remember specific text
> that should have been in that file.
> 
> As I don't need any fancy stuff like a progress bar, I
> decided to write a simple command, and I quickly got
> something up and running which I _assume_ will do what
> I need.
> 
> This is the command I've been running interactively in bash:
> 
>$ N=0; while true; do echo "${N}"; dd if=/dev/ad6 of=/dev/stdout bs=10240 
> count=1 skip=${N} 2>/dev/null | grep ""; if [ $? -eq 0 ]; then 
> break; fi; N=`expr ${N} + 1`; done
> 
> To make it look a bit better and illustrate the simple
> logic behind my idea:
> 
>N=0
>while true; do
>echo "${N}"
>dd if=/dev/ad6 of=/dev/stdout bs=10240 count=1 skip=${N} \
>2>/dev/null | grep ""
>if [ $? -eq 0 ]; then
>break
>fi
>N=`expr ${N} + 1`
>done
> 
> Here  refers to the text. It's only a small, but
> very distinctive portion. I'm searching in blocks of 10 kB
> so it's easier to continue in case something has been found.
> I plan to output the resulting "block" (it's not a real disk
> block, I know, it's simply a unit of 10 kB disk space) and
> maybe the previous and next one (in case the file, the _real_
> block containing the data, has been split across more than
> one of those units. I will then clean the "garbage" (maybe
> from other files) because I can easily determine the beginning
> and the end of the file.
> 
> Needless to say, it's a _text_ file.
> 
> I understand that grep operates on text files, but it will
> also happily return 0 if the text to search for will appear
> in a binary file, and possibly return the whole file as a
> search result (in case there are no newlines in it).
> 
> My questions:
> 
> 1. Is this the proper way of stupidly searching a disk?
> 
> 2. Is the block size (bs= parameter to dd) good, or should
>   I use a different value for better performance?
> 
> 3. Is there a program known that already implements the
>   functionality I need in terms of data recovery?
> 
> Results so far:
> 
> The disk in question is a 1 TB SATA disk. The command has
> been running for more than 12 hours now and returned one
> false-positive result, so basically it seems to work, but
> maybe I can do better? I can always continue search by
> adding 1 to ${N}, set it as start value, and re-run the
> command.
> 
> Any suggestion is welcome!
> 
> 


Hey that's actually a pretty creative way of doing things ;)

Just to make sure, you've stopped daemons and all the stuff that could 
potentially write to the drive and nuke your blocks right ?

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Grepping though a disk

2013-03-03 Thread Polytropon
Due to a fsck file system repair I lost the content of a file
I consider important, but it hasn't been backed up yet. The
file name is still present, but no blocks are associated
(file size is zero). I hope the data blocks (which are now
probably marked "unused") are still intact, so I thought
I'd search for them because I can remember specific text
that should have been in that file.

As I don't need any fancy stuff like a progress bar, I
decided to write a simple command, and I quickly got
something up and running which I _assume_ will do what
I need.

This is the command I've been running interactively in bash:

$ N=0; while true; do echo "${N}"; dd if=/dev/ad6 of=/dev/stdout 
bs=10240 count=1 skip=${N} 2>/dev/null | grep ""; if [ $? -eq 0 ]; 
then break; fi; N=`expr ${N} + 1`; done

To make it look a bit better and illustrate the simple
logic behind my idea:

N=0
while true; do
echo "${N}"
dd if=/dev/ad6 of=/dev/stdout bs=10240 count=1 skip=${N} \
2>/dev/null | grep ""
if [ $? -eq 0 ]; then
break
fi
N=`expr ${N} + 1`
done

Here  refers to the text. It's only a small, but
very distinctive portion. I'm searching in blocks of 10 kB
so it's easier to continue in case something has been found.
I plan to output the resulting "block" (it's not a real disk
block, I know, it's simply a unit of 10 kB disk space) and
maybe the previous and next one (in case the file, the _real_
block containing the data, has been split across more than
one of those units. I will then clean the "garbage" (maybe
from other files) because I can easily determine the beginning
and the end of the file.

Needless to say, it's a _text_ file.

I understand that grep operates on text files, but it will
also happily return 0 if the text to search for will appear
in a binary file, and possibly return the whole file as a
search result (in case there are no newlines in it).

My questions:

1. Is this the proper way of stupidly searching a disk?

2. Is the block size (bs= parameter to dd) good, or should
   I use a different value for better performance?

3. Is there a program known that already implements the
   functionality I need in terms of data recovery?

Results so far:

The disk in question is a 1 TB SATA disk. The command has
been running for more than 12 hours now and returned one
false-positive result, so basically it seems to work, but
maybe I can do better? I can always continue search by
adding 1 to ${N}, set it as start value, and re-run the
command.

Any suggestion is welcome!



-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"