I am trying to use Spark 1.3 (Standalone) against Hive 1.2 running on Hadoop 2.6. I looked the ThriftServer2 logs, and I realized that the server was not starting properly, because of failure in creating a server socket. In fact, I had passed the URI to my Hiveserver2 service, launched from Hive, and the beeline in Spark was directly talking to Hive's hiveserver2 and it was just using it as a Hive service.
I could fix starting the Thriftserver2 in Spark (by changing port), but I guess the missing puzzle piece for me is: How does Spark SQL re-uses the already created table in Hive ? I mean do I have to write an application that uses HiveContext to do that and submit it to Spark for execution, or is there a way to run SQL scripts directly via command line (in distributed mode and on the cluster) - (Just similar to the way that one would use Hive (or Shark) command line by passing a query file with -f flag). Looking at the Spark SQL documentation, it seems that it is possible. Please correct me if I am wrong. On Mon, Jun 8, 2015 at 6:56 PM, Cheng Lian <lian.cs....@gmail.com> wrote: > > On 6/9/15 8:42 AM, James Pirz wrote: > > Thanks for the help! > I am actually trying Spark SQL to run queries against tables that I've > defined in Hive. > > I follow theses steps: > - I start hiveserver2 and in Spark, I start Spark's Thrift server by: > $SPARK_HOME/sbin/start-thriftserver.sh --master > spark://spark-master-node-ip:7077 > > - and I start beeline: > $SPARK_HOME/bin/beeline > > - In my beeline session, I connect to my running hiveserver2 > !connect jdbc:hive2://hive-node-ip:10000 > > and I can run queries successfully. But based on hiveserver2 logs, It > seems it actually uses "Hadoop's MR" to run queries, *not* Spark's > workers. My goals is to access Hive's tables' data, but run queries through > Spark SQL using Spark workers (not Hadoop). > > Hm, interesting. HiveThriftServer2 should never issue MR jobs to perform > queries. I did receive two reports in the past which also say MR jobs > instead of Spark jobs were issued to perform the SQL query. However, I only > reproduced this issue in a rare corner case, which uses HTTP mode to > connect to Hive 0.12.0. Apparently this isn't your case. Would you mind to > provide more details so that I can dig in? The following information would > be very helpful: > > 1. Hive version > 2. A copy of your hive-site.xml > 3. Hadoop version > 4. Full HiveThriftServer2 log (which can be found in $SPARK_HOME/logs) > > Thanks in advance! > > > Is it possible to do that via Spark SQL (its CLI) or through its thrift > server ? (I tried to find some basic examples in the documentation, but I > was not able to) - Any suggestion or hint on how I can do that would be > highly appreciated. > > Thnx > > On Sun, Jun 7, 2015 at 6:39 AM, Cheng Lian <lian.cs....@gmail.com> wrote: > >> >> >> On 6/6/15 9:06 AM, James Pirz wrote: >> >> I am pretty new to Spark, and using Spark 1.3.1, I am trying to use >> 'Spark SQL' to run some SQL scripts, on the cluster. I realized that for a >> better performance, it is a good idea to use Parquet files. I have 2 >> questions regarding that: >> >> 1) If I wanna use Spark SQL against *partitioned & bucketed* tables >> with Parquet format in Hive, does the provided spark binary on the apache >> website support that or do I need to build a new spark binary with some >> additional flags ? (I found a note >> <https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables> >> in >> the documentation about enabling Hive support, but I could not fully get it >> as what the correct way of building is, if I need to build) >> >> Yes, Hive support is enabled by default now for the binaries on the >> website. However, currently Spark SQL doesn't support buckets yet. >> >> >> 2) Does running Spark SQL against tables in Hive downgrade the >> performance, and it is better that I load parquet files directly to HDFS or >> having Hive in the picture is harmless ? >> >> If you're using Parquet, then it should be fine since by default Spark >> SQL uses its own native Parquet support to read Parquet Hive tables. >> >> >> Thnx >> >> >> > >