dzamo commented on pull request #2359: URL: https://github.com/apache/drill/pull/2359#issuecomment-962569638
> The problem is, a tool that tries to be both a desert topping and a floor wax (let's see how old the readers are with this one), ends up being good at neither. @paul-rogers you got me with this idiom, but I like it! The broader topic is super interesting. If SQLite started adding the features needed to compete with Oracle Database 21c it would quickly fail at being SQLite. If Linux tried to be an OS kernel for both TVs and supercomputers it would... continue to dominate both extremes! There are some twists here! Pigeonholing formats into small scale and large scales is also a tricky business. For example, we naturally want to declare PDF a desktop format, but I can easily imagine a conversation like the following. "Hey Bob, remember that we sent decades of paper archives from the basement out to that big scanning centre for digitisation? They've come back as millions of pages of PDFs. Someone just asked me if we can help them find all invoices containing a particular SKU, and pull out the price on that line. The ERP system only has the last 10 years loaded into it and they want to go back further". "Chuck 'em in HDFS, we'll run a Drill query" "But PDF is a desktop publishing format, not a big data format! Surely our big data cluster will want nothing to do with it!?" "Drill's got a plugin architecture which led to people adding support for all sorts of weird and wonderful formats. Querying PDFs is a dubious business but we'll know after ~10 lines of SQL if we can do this with Drill or not. If not, miserable days or weeks of programming with a PDF library await one of our interns." -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org