Hey Michael, Cheng, Thanks for the replies. Sadly I can't remember the specific error so I'm going to chalk it up to user error, especially since others on the list have not had a problem.
@michael By the way, was at the Spark 1.1 meetup yesterday. Great event, very informative, cheers and keep doing more! @cheng Got it, cheers. Fortunately we don't have to deal with this use case, but that's good to know (especially the $SPARK_HOME bit). On Wed, Aug 27, 2014 at 3:36 PM, Cheng Lian <lian.cs....@gmail.com> wrote: > Hey Matt, if you want to access existing Hive data, you still need a to > run a Hive metastore service, and provide a proper hive-site.xml (just > drop it in $SPARK_HOME/conf). > > Could you provide the error log you saw? > > > > On Wed, Aug 27, 2014 at 12:09 PM, Michael Armbrust <mich...@databricks.com > > wrote: > >> I would expect that to work. What exactly is the error? >> >> >> On Wed, Aug 27, 2014 at 6:02 AM, Matt Chu <m...@kabam.com> wrote: >> >>> (apologies for sending this twice, first via nabble; didn't realize it >>> wouldn't get forwarded) >>> >>> Hey, I know it's not officially released yet, but I'm trying to >>> understand (and run) the Thrift-based JDBC server, in order to enable >>> remote JDBC access to our dev cluster. >>> >>> Before asking about details, is my understanding of this correct? >>> `sbin/start-thriftserver` is a JDBC/Hive server that doesn't require >>> running a Hive+MR cluster (i.e. just Spark/Spark+YARN)? >>> >>> Assuming yes, I have hope that it all basically works, just that some >>> documentation needs to be cleaned up: >>> >>> - I found a release page implying that 1.1 will be released "pretty >>> soon-ish": >>> https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage >>> - I can find recent (more recent 30 days or so) activity with promising >>> titles: ["Updated Spark SQL README to include the hive-thriftserver >>> module"](https://github.com/apache/spark/pull/1867), >>> ["[SPARK-2410][SQL] Merging Hive Thrift/JDBC server (with Maven profile >>> fix)"](https://github.com/apache/spark/pull/1620) >>> >>> Am I following all the right email threads, issues trackers, and >>> whatnot? >>> >>> Specifically, I tried: >>> >>> 1. Building off of `branch-1.1`, synced as of ~today (2014 Aug 25) >>> 2. Running `sbin/start-thriftserver.sh` in `yarn-client` mode >>> 3. Can see the processing running, and the spark context/app created in >>> yarn logs, >>> and can connect to the thrift server on the default port of 10000 using >>> `bin/beeline` >>> 4. However, when I try to find out what that cluster has via `show >>> tables;`, in the logs >>> I see a connection error to some (what I assume to be) random port. >>> >>> So what service am I forgetting/too ignorant to run? Or did I >>> misunderstand and we do need a live Hive instance to back thriftserver? Or >>> is this a YARN-specific issue? >>> >>> Only recently started learning the ecosystem and community, so apologies >>> for the longer post and lots of questions. :) >>> >>> Matt >>> >> >> >