I had an idea last week and implemented it quickly over the weekend. You know 
how bash hackers write pipelines of operations like grep, sort, uniq, sed? 
Those are basically relational operations, but the pipelines are difficult to 
write because you’re dealing with space-separated strings. So, my idea was to 
allow people to write the same pipelines using SQL. Which meant making SQL 
easily available from the command line, and making the data sources of those 
operations (shell commands such as du, ps, git log) available as tables.

I call this the OS adapter, and the script that launches SQL from the command 
line is sqlsh. To find the 5 most prolific committers you’d type

$ git log | grep Author: | sort | uniq -c | sort -nr | head -5

and now you can instead type

$ ./sqlsh select author, count\(\*\) from git_commits group by 1 order by 2 
desc limit 5

and Calcite reads from the same data source and executes the query using its 
operators.

It’s ready to commit. Can someone please review 
https://issues.apache.org/jira/browse/CALCITE-1896 
<https://issues.apache.org/jira/browse/CALCITE-1896>?

It would be great to get contributions to this. Adding new data sources 
(/etc/passwd, netstat, the file system, apt, the maven repo) should be fairly 
straightforward. 

Julian



Reply via email to