Re: sha256sum --text generating blank spaces and hyphens?

David Christensen Wed, 26 Apr 2023 21:02:59 -0700

On 4/26/23 15:48, Nicolas George wrote:

David Christensen (12023-04-26):

I suggest hashing the document content rather than the URL.  This would work
nicely for static documents.


That will be very convenient to retrieve the document content from the
URL.

My suggestion assumes that the URL => hash => content mapping is savedsomehow. For example, save the content in a file named after the hashand save the URL in a file whose name is the hash plus a suffix.Finding a document by URL then becomes a grep(1) invocation.

Things get more interesting when you approach the problem as a database.Save the content wherever and put the metadata into a table -- contenthash (primary key), URL, download timestamp, author, subject, title,keywords, etc.. Create fully inverted indexes. Create a search engine.Create a spider. Implementation could range from a CSV/TSV flat-fileand shell/P* scripts, to a desktop database/UI, to a LAMP stack, andbeyond (NoSQL, N-tier). There are distributed file sharing systemsbased on such ideas.



David

Re: sha256sum --text generating blank spaces and hyphens?

Reply via email to