RE: Spark SQL and Hive tables

2014-07-25 Thread sstilak
Thanks!  Will do.


Sent via the Samsung GALAXY S®4, an AT&T 4G LTE smartphone

 Original message From: Michael Armbrust 
 Date:07/25/2014  3:24 PM  (GMT-08:00) 
To: user@spark.apache.org Subject: Re: Spark SQL and Hive 
tables 

>
> [S]ince Hive has a large number of dependencies, it is not included in the
> default Spark assembly. In order to use Hive you must first run 
> ‘SPARK_HIVE=true
> sbt/sbt assembly/assembly’ (or use -Phive for maven). This command builds
> a new assembly jar that includes Hive. Note that this Hive assembly jar
> must also be present on all of the worker nodes, as they will need access
> to the Hive serialization and deserialization libraries (SerDes) in order
> to acccess data stored in Hive.



On Fri, Jul 25, 2014 at 3:20 PM, Sameer Tilak  wrote:

> Hi Jerry,
>
> I am having trouble with this. May be something wrong with my import or
> version etc.
>
> scala> import org.apache.spark.sql._;
> import org.apache.spark.sql._
>
> scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
> :24: error: object hive is not a member of package
> org.apache.spark.sql
>val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>   ^
> Here is what I see for autocompletion:
>
> scala> org.apache.spark.sql.
> Row SQLContext  SchemaRDD   SchemaRDDLike   api
> catalystcolumnarexecution   package parquet
> test
>
>
> --
> Date: Fri, 25 Jul 2014 17:48:27 -0400
>
> Subject: Re: Spark SQL and Hive tables
> From: chiling...@gmail.com
> To: user@spark.apache.org
>
>
> Hi Sameer,
>
> The blog post you referred to is about Spark SQL. I don't think the intent
> of the article is meant to guide you how to read data from Hive via Spark
> SQL. So don't worry too much about the blog post.
>
> The programming guide I referred to demonstrate how to read data from Hive
> using Spark SQL. It is a good starting point.
>
> Best Regards,
>
> Jerry
>
>
> On Fri, Jul 25, 2014 at 5:38 PM, Sameer Tilak  wrote:
>
> Hi Michael,
> Thanks. I am not creating HiveContext, I am creating SQLContext. I am
> using CDH 5.1. Can you please let me know which conf/ directory you are
> talking about?
>
> --
> From: mich...@databricks.com
> Date: Fri, 25 Jul 2014 14:34:53 -0700
>
> Subject: Re: Spark SQL and Hive tables
> To: user@spark.apache.org
>
>
> In particular, have you put your hive-site.xml in the conf/ directory?
>  Also, are you creating a HiveContext instead of a SQLContext?
>
>
> On Fri, Jul 25, 2014 at 2:27 PM, Jerry Lam  wrote:
>
> Hi Sameer,
>
> Maybe this page will help you:
> https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables
>
> Best Regards,
>
> Jerry
>
>
>
> On Fri, Jul 25, 2014 at 5:25 PM, Sameer Tilak  wrote:
>
> Hi All,
> I am trying to load data from Hive tables using Spark SQL. I am using
> spark-shell. Here is what I see:
>
> val trainingDataTable = sql("""SELECT prod.prod_num, demographics.gender,
> demographics.birth_year, demographics.income_group  FROM prod p JOIN
> demographics d ON d.user_id = p.user_id""")
>
> 14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch
> MultiInstanceRelations
> 14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch
> CaseInsensitiveAttributeReferences
> java.lang.RuntimeException: Table Not Found: prod.
>
> I have these tables in hive. I used show tables command to confirm this.
> Can someone please let me know how do I make them accessible here?
>
>
>
>
>


RE: CoarseGrainedExecutorBackend: Driver Disassociated

2014-07-08 Thread sstilak
Hi Aaron,
I have 4 nodes - 1 master and 3 workers. I am not setting up driver public dns 
name anywhere. I didn't see that step in the documentation -- may be I missed 
it. Can you please point me in the right direction?


Sent via the Samsung GALAXY S®4, an AT&T 4G LTE smartphone

 Original message From: Aaron Davidson 
 Date:07/08/2014  12:00 PM  (GMT-08:00) 
To: user@spark.apache.org Subject: Re: 
CoarseGrainedExecutorBackend: Driver Disassociated 

Hmm, looks like the Executor is trying to connect to the driver on
localhost, from this line:
14/07/08 11:07:13 INFO CoarseGrainedExecutorBackend: Connecting to driver:
akka.tcp://spark@localhost:39701/user/CoarseGrainedScheduler

What is your setup? Standalone mode with 4 separate machines? Are you
configuring the driver public dns name somewhere?


On Tue, Jul 8, 2014 at 11:52 AM, Sameer Tilak  wrote:

> Dear All,
>
> When I look inside the following directory on my worker node:
> $SPARK_HOME/work/app-20140708110707-0001/3
>
> I see the following error message:
>
> log4j:WARN No appenders could be found for logger
> (org.apache.hadoop.conf.Configuration).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
> 14/07/08 11:07:11 INFO SparkHadoopUtil: Using Spark's default log4j
> profile: org/apache/spark/log4j-defaults.properties
> 14/07/08 11:07:11 INFO SecurityManager: Changing view acls to: p529444
> 14/07/08 11:07:11 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(p529444)
> 14/07/08 11:07:12 INFO Slf4jLogger: Slf4jLogger started
> 14/07/08 11:07:12 INFO Remoting: Starting remoting
> 14/07/08 11:07:13 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://sparkexecu...@pzxnvm2022.dcld.pldc.kp.org:34679]
> 14/07/08 11:07:13 INFO Remoting: Remoting now listens on addresses:
> [akka.tcp://sparkexecu...@pzxnvm2022.x.y.name.org:34679]
> 14/07/08 11:07:13 INFO CoarseGrainedExecutorBackend: Connecting to driver:
> akka.tcp://spark@localhost:39701/user/CoarseGrainedScheduler
> 14/07/08 11:07:13 INFO WorkerWatcher: Connecting to worker akka.tcp://
> sparkwor...@pzxnvm2022.x.y.name.org:37054/user/Worker
> 14/07/08 11:07:13 ERROR CoarseGrainedExecutorBackend: Driver Disassociated
> [akka.tcp://sparkexecu...@pzxnvm2022.dcld.pldc.kp.org:34679] -> [akka
>
>
> I am not sure what the problem is but it is preventing me to get the 4
> node test cluster up and running.
>
>


RE: Spark and Hadoop cluster

2014-03-21 Thread sstilak
Thanks,  Mayur.


Sent via the Samsung GALAXY S®4, an AT&T 4G LTE smartphone

 Original message From: Mayur Rustagi 
 Date:03/21/2014  11:32 AM  (GMT-08:00) 
To: user@spark.apache.org Subject: Re: Spark and Hadoop 
cluster 

Both are quite stable. Yarn is in beta though so would be good to test on
Standalone till Spark 1.0.0.
Regards
Mayur

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi 



On Fri, Mar 21, 2014 at 2:19 PM, Sameer Tilak  wrote:

> Hi everyone,
> We are planning to set up Spark. The documentation mentions that it is
> possible to run Spark in standalone mode on a Hadoop cluster. Does anyone
> have any comments on stability and performance of this mode?
>