Hello I downloaded Apache Spark pre built for Hadoop 2.6 <http://www.apache.org/dyn/closer.lua/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgz>. When I create a table, an empty directory with the same name is created in /user/hive/warehouse. I created tables with the following kind of statement:
> create table aTable (aColumn string) When I place text files in the directory eg. /user/hive/warehouse/atable/text-file, I can query the contents with "select * from aTable" for example. When I create a table with the following I can only query the specified file (/path/to/json/file): > CREATE TABLE jsonTable USING org.apache.spark.sql.json OPTIONS ( path > "/path/to/json/file" ) A directory eg. /user/hive/warehouse/jsontable is created, but if I put files in there queries do not access the contents of those files. Is this related to managed versus external tables or why is this? Tables created with USING org.apache.spark.sql.json... are external tables and tables created by specifying columns are managed. How do you make a managed table in the same way the external tables are created above ie. without specifying columns and instead creating columns based on JSON content? I would expect queries on the managed table to give access to data in files after the files are put in the managed table directory as I have seen on managed tables I have created so far. Thanks very much Brendan