Re: Grepping though a disk
On Mon, 4 Mar 2013, Polytropon wrote: The file size of the file I'm searching for is less than 10 kB. It's a relatively small text file which got some subsequent additions in the last days, but hasn't been part of the backup job yet. There have been some good suggestions. I would use a large buffer with dd, say 1M or more, both for speed and to reduce the chance of hitting only part of the search string. For the future, look at sysutils/rsnapshot. Easy to set up, space-efficient, and provides an easily-accessed file history. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Grepping though a disk
On Mon, Mar 4, 2013 at 1:36 AM, Polytropon wrote: > Any suggestion is welcome! How about crawling the metadata, locating each block that is already allocated, and skip those blocks when you scan the disk? That could reduce the searching space significantly. blkls(1) et al. from the Sleuth Kit are your friends. Good luck, -cpghost. -- Cordula's Web. http://www.cordula.ws/ ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Grepping though a disk
Hi Polytropon & cc questions@ > Any suggestion is welcome! Ideas: A themed list: freebsd...@freebsd.org There's a bunch of fs tools in /usr/ports/sysutils/ My http://www.berklix.com/~jhs/src/bsd/jhs/bin/public/slice/ slices large images such as tapes & disks (also the slice names would give numbers convertable to offsets probaably useful to eg ..a) man fsdb A bit of custom C should run a lot faster than shells & greps, eg when I was looking for nasty files from a bad scsi controller, I wrote http://www.berklix.com/~jhs/src/bsd/jhs/bin/public/8f/ One could run eg slice asynchronously & suspend ^Z when you run out of space, & periodicaly run some custom C (like 8f.c) or some find grep -v rm loop to discard most slices as of no interest. Then resume slicing. OK, thats doing writes too, so slower than just read & a later dd with seek=whatever, depends how conservative one's feeling, about doing reruns with other search criteria. You mentioned risk of text string chopped across a slice/block boundary. Certainly a risk. Presumably solution is to search twice. 2nd time after a dd with a half block/ slice size offset, then slice/search again. If you runout of space to do that, you might write a temporary disklabel/bsdlabel with an extra partition with a half block offset .. dodgy stuff that, do it while you'r wide awake :-) Always a pain these scenarios, loosing hours of human & CPU time, I hope data's worth it, good luck. Cheers, Julian -- Julian Stacey, BSD Unix Linux C Sys Eng Consultant, Munich http://berklix.com Reply below not above, like a play script. Indent old text with "> ". Send plain text. No quoted-printable, HTML, base64, multipart/alternative. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Grepping though a disk
On Mon, 4 Mar 2013 11:29:00 +, Steve O'Hara-Smith wrote: > On Mon, 4 Mar 2013 12:15:24 +0100 > Polytropon wrote: > > > But I don't know how to do this. From reading "man dd" > > my impression (consistent with my experience) is that > > the option skip= operates in units of bs= size, so I'm > > not sure how to compose a command that reads units of > > 1 MB, but skips in units of 950 kB. Maybe some parts of > > my memory have also been marked "unused" by fsck. :-) > > Not too hard (you'll kick yourself when you read down) - translation > to valid shell script is left as an exercise for the reader :) > > bs=50k count=(n*20) skip=(n*20 - 1) > > Probably nicer to use powers of 2 > > bs=64k count=(n*16) skip=(n*16 - 1) Thanks for the pointer. I was so concentrated on finding the answer within dd that I hadn't thought about that. It's easy to write this in shell code. As a conclusion, I will apply for further IQ reduction, seems that I have enough spare brain power I don't use anyway. :-) -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ... ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Grepping though a disk
On Mon, 4 Mar 2013 12:15:24 +0100 Polytropon wrote: > But I don't know how to do this. From reading "man dd" > my impression (consistent with my experience) is that > the option skip= operates in units of bs= size, so I'm > not sure how to compose a command that reads units of > 1 MB, but skips in units of 950 kB. Maybe some parts of > my memory have also been marked "unused" by fsck. :-) Not too hard (you'll kick yourself when you read down) - translation to valid shell script is left as an exercise for the reader :) bs=50k count=(n*20) skip=(n*20 - 1) Probably nicer to use powers of 2 bs=64k count=(n*16) skip=(n*16 - 1) -- Steve O'Hara-Smith ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Grepping though a disk
On Mon, 04 Mar 2013 04:15:48 -0600, Joshua Isom wrote: > I'd call bs= essential for speed. Any copying will be faster with > something higher. I thought about that. Narrowing down _if_ something has found is easy, e. g. when the positive 1 MB unit is dd'ed to a file, further work can easily be applied. > Also, there's the possibility, very annoying, that > your search string overlaps a place where you read. I'd probably check > 1M blocks, but advance maybe 950k each time. I also thought about that, that's why the distinctive phrase I'm searching for is less than 10 characters long. Still it's possible that it appears "across" a boundary of units, no matter how big or small I select bs=. But I don't know how to do this. From reading "man dd" my impression (consistent with my experience) is that the option skip= operates in units of bs= size, so I'm not sure how to compose a command that reads units of 1 MB, but skips in units of 950 kB. Maybe some parts of my memory have also been marked "unused" by fsck. :-) > Make sure you're reading > from block offsets for maximum speed. How do I do that? The disk is a normal HDD which has been initialized with "newfs -U" and no further options. ad6: 953869MB at ata3-master UDMA100 SATA 1.5Gb/s The file system spans the whole disk. > I know disk editors exist, I > remember using one on Mac OS 8.6 for find a lost file. That was back on > a 6 gig hard drive. Ha, I've done stuff like that on DOS with "important business data" many years ago, using the "Norton Disk Doctor" (NDD.EXE) when Norton (today: Symantec) wasn't yet synonymous for "The Yellow Plague". This program actually was quite cool, and you could search for things, manipulate disks on several levels (files, file system and below). I had even rewritten an entire partition table from scratch, memory and handheld calculator after an OS/2 installation went crazy. :-) > Depending on the file size, you could open the disk in vi and just > search from there, or just run strings on the disk and pipe it to vi. You mean like "strings /dev/ad6 | something", without dd? That would give me _no_ progress indicator (with my initial approach I have increasing numbers at least), but I doubt I can load a 1 TB partition in a vi session with only 2 GB RAM in the machine. If I try "strings /dev/ad6" I get a warning: "strings: Warning: '/dev/ad6' is not an ordinary file". True. But this opens a useful use of cat: "cat /dev/ad6 | strings". (Interesting idea, I will investigate this further.) The file size of the file I'm searching for is less than 10 kB. It's a relatively small text file which got some subsequent additions in the last days, but hasn't been part of the backup job yet. I can only remember parts of those additions, because as I said my brain is not good with computer. :-) Or do you think of something different? If yes, please explain. The urge to learn is strong when something went wrong. :-) -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ... ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Grepping though a disk
On 3/3/2013 6:36 PM, Polytropon wrote: Due to a fsck file system repair I lost the content of a file I consider important, but it hasn't been backed up yet. The file name is still present, but no blocks are associated (file size is zero). I hope the data blocks (which are now probably marked "unused") are still intact, so I thought I'd search for them because I can remember specific text that should have been in that file. As I don't need any fancy stuff like a progress bar, I decided to write a simple command, and I quickly got something up and running which I _assume_ will do what I need. This is the command I've been running interactively in bash: $ N=0; while true; do echo "${N}"; dd if=/dev/ad6 of=/dev/stdout bs=10240 count=1 skip=${N} 2>/dev/null | grep ""; if [ $? -eq 0 ]; then break; fi; N=`expr ${N} + 1`; done To make it look a bit better and illustrate the simple logic behind my idea: N=0 while true; do echo "${N}" dd if=/dev/ad6 of=/dev/stdout bs=10240 count=1 skip=${N} \ 2>/dev/null | grep "" if [ $? -eq 0 ]; then break fi N=`expr ${N} + 1` done Here refers to the text. It's only a small, but very distinctive portion. I'm searching in blocks of 10 kB so it's easier to continue in case something has been found. I plan to output the resulting "block" (it's not a real disk block, I know, it's simply a unit of 10 kB disk space) and maybe the previous and next one (in case the file, the _real_ block containing the data, has been split across more than one of those units. I will then clean the "garbage" (maybe from other files) because I can easily determine the beginning and the end of the file. Needless to say, it's a _text_ file. I understand that grep operates on text files, but it will also happily return 0 if the text to search for will appear in a binary file, and possibly return the whole file as a search result (in case there are no newlines in it). My questions: 1. Is this the proper way of stupidly searching a disk? 2. Is the block size (bs= parameter to dd) good, or should I use a different value for better performance? 3. Is there a program known that already implements the functionality I need in terms of data recovery? Results so far: The disk in question is a 1 TB SATA disk. The command has been running for more than 12 hours now and returned one false-positive result, so basically it seems to work, but maybe I can do better? I can always continue search by adding 1 to ${N}, set it as start value, and re-run the command. Any suggestion is welcome! I'd call bs= essential for speed. Any copying will be faster with something higher. Also, there's the possibility, very annoying, that your search string overlaps a place where you read. I'd probably check 1M blocks, but advance maybe 950k each time. Make sure you're reading from block offsets for maximum speed. I know disk editors exist, I remember using one on Mac OS 8.6 for find a lost file. That was back on a 6 gig hard drive. Depending on the file size, you could open the disk in vi and just search from there, or just run strings on the disk and pipe it to vi. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Grepping though a disk
On Mon, 4 Mar 2013 10:09:50 +0100, Damien Fleuriot wrote: > Hey that's actually a pretty creative way of doing things ;) It could be more optimum. :-) My thought is that I could maybe use a better bs= to make the whole thing run faster. I understand that for every unit, a subprocess dd | grep is started and an if [] test is run. Maybe doing this with 1 MB per unit would be better? Note that I need to grep through 1 TB in 10 kB steps... > Just to make sure, you've stopped daemons and all the stuff > that could potentially write to the drive and nuke your > blocks right ? Of course. The /dev/ad6 disk is a separate data disk which is not in use at the moment (unmounted). Still it is possible that the block has already been overwritten, but when the search has finished, it's almost certain in what state the data is. I would rewrite the file, but my eidetic memory is not working well anymore, so I can only remember parts of it... :-( -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ... ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Grepping though a disk
On 4 Mar 2013, at 01:36, Polytropon wrote: > Due to a fsck file system repair I lost the content of a file > I consider important, but it hasn't been backed up yet. The > file name is still present, but no blocks are associated > (file size is zero). I hope the data blocks (which are now > probably marked "unused") are still intact, so I thought > I'd search for them because I can remember specific text > that should have been in that file. > > As I don't need any fancy stuff like a progress bar, I > decided to write a simple command, and I quickly got > something up and running which I _assume_ will do what > I need. > > This is the command I've been running interactively in bash: > >$ N=0; while true; do echo "${N}"; dd if=/dev/ad6 of=/dev/stdout bs=10240 > count=1 skip=${N} 2>/dev/null | grep ""; if [ $? -eq 0 ]; then > break; fi; N=`expr ${N} + 1`; done > > To make it look a bit better and illustrate the simple > logic behind my idea: > >N=0 >while true; do >echo "${N}" >dd if=/dev/ad6 of=/dev/stdout bs=10240 count=1 skip=${N} \ >2>/dev/null | grep "" >if [ $? -eq 0 ]; then >break >fi >N=`expr ${N} + 1` >done > > Here refers to the text. It's only a small, but > very distinctive portion. I'm searching in blocks of 10 kB > so it's easier to continue in case something has been found. > I plan to output the resulting "block" (it's not a real disk > block, I know, it's simply a unit of 10 kB disk space) and > maybe the previous and next one (in case the file, the _real_ > block containing the data, has been split across more than > one of those units. I will then clean the "garbage" (maybe > from other files) because I can easily determine the beginning > and the end of the file. > > Needless to say, it's a _text_ file. > > I understand that grep operates on text files, but it will > also happily return 0 if the text to search for will appear > in a binary file, and possibly return the whole file as a > search result (in case there are no newlines in it). > > My questions: > > 1. Is this the proper way of stupidly searching a disk? > > 2. Is the block size (bs= parameter to dd) good, or should > I use a different value for better performance? > > 3. Is there a program known that already implements the > functionality I need in terms of data recovery? > > Results so far: > > The disk in question is a 1 TB SATA disk. The command has > been running for more than 12 hours now and returned one > false-positive result, so basically it seems to work, but > maybe I can do better? I can always continue search by > adding 1 to ${N}, set it as start value, and re-run the > command. > > Any suggestion is welcome! > > Hey that's actually a pretty creative way of doing things ;) Just to make sure, you've stopped daemons and all the stuff that could potentially write to the drive and nuke your blocks right ? ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Grepping though a disk
Due to a fsck file system repair I lost the content of a file I consider important, but it hasn't been backed up yet. The file name is still present, but no blocks are associated (file size is zero). I hope the data blocks (which are now probably marked "unused") are still intact, so I thought I'd search for them because I can remember specific text that should have been in that file. As I don't need any fancy stuff like a progress bar, I decided to write a simple command, and I quickly got something up and running which I _assume_ will do what I need. This is the command I've been running interactively in bash: $ N=0; while true; do echo "${N}"; dd if=/dev/ad6 of=/dev/stdout bs=10240 count=1 skip=${N} 2>/dev/null | grep ""; if [ $? -eq 0 ]; then break; fi; N=`expr ${N} + 1`; done To make it look a bit better and illustrate the simple logic behind my idea: N=0 while true; do echo "${N}" dd if=/dev/ad6 of=/dev/stdout bs=10240 count=1 skip=${N} \ 2>/dev/null | grep "" if [ $? -eq 0 ]; then break fi N=`expr ${N} + 1` done Here refers to the text. It's only a small, but very distinctive portion. I'm searching in blocks of 10 kB so it's easier to continue in case something has been found. I plan to output the resulting "block" (it's not a real disk block, I know, it's simply a unit of 10 kB disk space) and maybe the previous and next one (in case the file, the _real_ block containing the data, has been split across more than one of those units. I will then clean the "garbage" (maybe from other files) because I can easily determine the beginning and the end of the file. Needless to say, it's a _text_ file. I understand that grep operates on text files, but it will also happily return 0 if the text to search for will appear in a binary file, and possibly return the whole file as a search result (in case there are no newlines in it). My questions: 1. Is this the proper way of stupidly searching a disk? 2. Is the block size (bs= parameter to dd) good, or should I use a different value for better performance? 3. Is there a program known that already implements the functionality I need in terms of data recovery? Results so far: The disk in question is a 1 TB SATA disk. The command has been running for more than 12 hours now and returned one false-positive result, so basically it seems to work, but maybe I can do better? I can always continue search by adding 1 to ${N}, set it as start value, and re-run the command. Any suggestion is welcome! -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ... ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"