Re: When queried through hiveContext, does hive executes these queries using its execution engine (default is map-reduce), or spark just reads the data and performs those queries itself?

2016-06-08 Thread lalit sharma
To add on what Vikash said above, bit more internals :
1. There are 2 components which work together to achieve Hive + Spark
integration
   a. HiveContext which extends SqlContext adds logic to add hive specific
things e.g. loading jars to talk to underlying metastore db, load configs
in hive-site.xml
   b. HiveThriftServer2 which uses native HiveServer2 and add logic for
creating sessions, handling operations.
2. Once thrift server is up , authentication , session management is all
delegated to Hive classes. Once parsing of query is done and logical plan
is created and passed on to create DataFrame.

So no mapReduce , spark intelligently uses needed pieces from Hive and use
its own execution engine.

--Regards,
Lalit

On Wed, Jun 8, 2016 at 9:59 PM, Vikash Pareek  wrote:

> Himanshu,
>
> Spark doesn't use hive execution engine (Map Reduce) to execute query.
> Spark
> only reads the meta data from hive meta store db and executes the query
> within Spark execution engine. This meta data is used by Spark's own SQL
> execution engine (this includes components such as catalyst, tungsten to
> optimize queries) to execute query and generate result faster than hive
> (Map
> Reduce).
>
> Using HiveContext means connecting to hive meta store db. Thus, HiveContext
> can access hive meta data, and hive meta data includes location of data,
> serialization and de-serializations, compression codecs, columns, datatypes
> etc. thus, Spark have enough information about the hive tables and it's
> data
> to understand the target data and execute the query over its on execution
> engine.
>
> Overall, Spark replaced the Map Reduce model completely by it's
> in-memory(RDD) computation engine.
>
> - Vikash Pareek
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/When-queried-through-hiveContext-does-hive-executes-these-queries-using-its-execution-engine-default-tp27114p27117.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: can not use udf in hivethriftserver2

2016-05-30 Thread lalit sharma
Can you try adding jar to SPARK_CLASSPATH env variable ?

On Mon, May 30, 2016 at 9:55 PM, 喜之郎 <251922...@qq.com> wrote:

> HI all, I have a problem when using hiveserver2 and beeline.
> when I use CLI mode, the udf works well.
> But when I begin to use hiveserver2 and beeline, the udf can not work.
> My Spark version is 1.5.1.
> I tried 2 methods, first:
> ##
> add jar /home/hadoop/dmp-udf-0.0.1-SNAPSHOT.jar;
> create temporary function URLEncode as "com.dmp.hive.udfs.utils.URLEncode"
> ;
>
> errors:
> Error: org.apache.spark.sql.AnalysisException: undefined function
> URLEncode; line 1 pos 207 (state=,code=0)
>
>
> second:
> create temporary function URLEncode as 'com.dmp.hive.udfs.utils.URLEncode'
> using jar
> 'hdfs:///warehouse/dmpv3.db/datafile/libjars/dmp-udf-0.0.1-SNAPSHOT.jar';
>
> the error is same:
> Error: org.apache.spark.sql.AnalysisException: undefined function
> URLEncode; line 1 pos 207 (state=,code=0)
>
> ###
>
> can anyone give some suggestions? Or how to use udf in hiveserver2/beeline
> mode?
>
>
>


Re: Not able pass 3rd party jars to mesos executors

2016-05-11 Thread lalit sharma
Point to note as per docs as well :

*Note that jars or python files that are passed to spark-submit should be
URIs reachable by Mesos slaves, as the Spark driver doesn’t automatically
upload local jars.**http://spark.apache.org/docs/latest/running-on-mesos.html
 *

On Wed, May 11, 2016 at 10:05 PM, Giri P  wrote:

> I'm not using docker
>
> On Wed, May 11, 2016 at 8:47 AM, Raghavendra Pandey <
> raghavendra.pan...@gmail.com> wrote:
>
>> By any chance, are you using docker to execute?
>> On 11 May 2016 21:16, "Raghavendra Pandey" 
>> wrote:
>>
>>> On 11 May 2016 02:13, "gpatcham"  wrote:
>>>
>>> >
>>>
>>> > Hi All,
>>> >
>>> > I'm using --jars option in spark-submit to send 3rd party jars . But I
>>> don't
>>> > see they are actually passed to mesos slaves. Getting Noclass found
>>> > exceptions.
>>> >
>>> > This is how I'm using --jars option
>>> >
>>> > --jars hdfs://namenode:8082/user/path/to/jar
>>> >
>>> > Am I missing something here or what's the correct  way to do ?
>>> >
>>> > Thanks
>>> >
>>> >
>>> >
>>> > --
>>> > View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Not-able-pass-3rd-party-jars-to-mesos-executors-tp26918.html
>>> > Sent from the Apache Spark User List mailing list archive at
>>> Nabble.com.
>>> >
>>> > -
>>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> > For additional commands, e-mail: user-h...@spark.apache.org
>>> >
>>>
>>
>