On Fri, 8 Jun 2012 14:05:06 -0700 David Leimbach <leim...@gmail.com> wrote:
> On Fri, Jun 8, 2012 at 8:44 AM, erik quanstrom > <quans...@labs.coraid.com>wrote: > > > i haven't seen any evidence that strongly typed files are a good idea. > > but maybe > > others have? > > > > I can tell you that the "Big Data Analytics" explosion that's been going on > that is creating lots of jobs for data scientists, has an awful lot to do > with the fact that files on a filesystem are unstructured or "untyped" > (without a schema). Most of this is passing me by, but I would really like a search engine for my own hard drive, if that's part of what you're talking about. I guess with a search engine types matter, but there are very few file types you'd want to read, view, or watch which can't be determined by looking at the first part of the file. Within the file each type has its own structure. I don't really understand what's being said by "unstructured" here unless they want programs to handle all types without recognising each one individually. Even then I don't see the problem because many file types are just containers anyway, but I should probably stop there as I really don't know what comes of analytics and I don't have big data for any purposes other than to be searched. > > On systems like iOS, applications don't expose a file system to the end > user but instead apps that work with PDFs can be used to forward those > documents to other applications that understand PDFs. This corresponds > more to "data types". The type-less mode is more general, and the typed > mode seems easier to reason about. It sounds like a weaker version of old PalmOS which had databases not files and each db had a type associating it with an app. I think I prefer the weaker version although I will probably dissect some of my numerous Palm dbs if I find the time. I'd like to see how they're structured. > > In fact, the people who will eat the lunch of these people wrangling > unstructured data, are the ones that figure out how to structure the data > in a way that it's not a problem anymore. I will be very, VERY interested if they manage to find something that actually works, considering the last attempt was XML. ;) Even if you don't count XML as an attempt at a universal structure, it's certainly used for that, a lot.