Using Spark via the Thrift server is fine and good but it limits yourself to simple SQL queries. For all complex Spark logic you have to submit a job first, write the result into a table and then query the table. This has obviously the limitation that a) The user executing the query cannot pass in information b) The user executing the query has no idea how current the intermediate table is c) Requires to compile the analytic logic into a jar file, upload it, submit it
Using spark-shell all is much more interactive obviously, you write the Scala code line by line and visualize the result. But obviously not via JDBC/Thrift. Wouldn't it make sense to support a syntax like create temporary table myview using Scala options (sourcecode 'val dataframe = sqlContext...... %table dataframe' ); Example: There is a directory with millions of files. A trained MLlib model is used to categorize these files, output is a dataframe. Via JDBC you want to get the categorization of all files with the name text_2016_07_*.txt only. Does it make sense? I can't see how this could be done today without a lot of disadvantages but I am far from being an expert, so please bare with me. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-code-as-spark-view-tp27353.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org