Re: Spark-shell throws Hive error when SQLContext.parquetFile, v1.3

2015-09-10 Thread Mohammad Islam
In addition to Cheng's comment --
I found the similar problem when hive-site.xml is not in the class path. A 
proper stack trace can pinpoint the problem.

In the mean time, you can add it into your environment through 
HADOOP_CLASSPATH. (export HADOOP_CONF_DIR=/etc/hive/conf/)  
See more at 
http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_rn_spark_ki.html
and look for "Spark not automatically picking up hive-site.xml".


On Thursday, September 10, 2015 5:01 AM, Cheng Lian  
wrote:



If you don't need to interact with Hive, you may compile Spark without 
using the -Phive flag to eliminate Hive dependencies. In this way, the 
sqlContext instance in Spark shell will be of type SQLContext instead of 
HiveContext.

The reason behind the Hive metastore error is probably due to Hive 
misconfiguration.

Cheng


On 9/10/15 6:02 PM, Petr Novak wrote:
> Hello,
>
> sqlContext.parquetFile(dir)
>
> throws exception " Unable to instantiate 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient"
>
> The strange thing is that on the second attempt to open the file it is 
> successful:
>
> try {
> sqlContext.parquetFile(dir)
>   } catch {
> case e: Exception => sqlContext.parquetFile(dir)
> }
>
> What should I do to make my script to run flawlessly in spark-shell 
> when opening parquetFiles. It is probably missing some dependency. Or 
> how should I write the code because this double attempt is awfull and 
> I don't need HiveMetaStoreClient, I just need to open parquet file.
>
> Many thanks for any idea,
> Petr
>
>


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Creating Parquet external table using HiveContext API

2015-09-10 Thread Mohammad Islam
Thanks a lot Michael for giving a solution.
If I want to provide my own schema, can I do that?
 


 On Thursday, September 10, 2015 11:05 AM, Michael Armbrust 
<mich...@databricks.com> wrote:
   

 Easiest is to just use SQL:
hiveContext.sql("CREATE TABLE  USING parquet OPTIONS (path 
'')")
When you specify the path its automatically created as an external table.  The 
schema will be discovered.
On Wed, Sep 9, 2015 at 9:33 PM, Mohammad Islam <misla...@yahoo.com.invalid> 
wrote:

Hi,I want to create  an external hive table using HiveContext. I have the 
following :1. full path/location of parquet data directory2. name of the new 
table3. I can get the schema as well.
What API will be the best (for 1,3.x or 1.4.x)? I can see 6 
createExternalTable() APIs but not sure which one will be the best.I didn't 
find any good documentation in source code or Java doc about the parameters of 
the APIs (i.e path, source, options etc); Any help will be appreciated.

Regards,Mohammad




  

Creating Parquet external table using HiveContext API

2015-09-09 Thread Mohammad Islam
Hi,I want to create  an external hive table using HiveContext. I have the 
following :1. full path/location of parquet data directory2. name of the new 
table3. I can get the schema as well.
What API will be the best (for 1,3.x or 1.4.x)? I can see 6 
createExternalTable() APIs but not sure which one will be the best.I didn't 
find any good documentation in source code or Java doc about the parameters of 
the APIs (i.e path, source, options etc); Any help will be appreciated.

Regards,Mohammad


Re: HiveContext test, Spark Context did not initialize after waiting 10000ms

2015-05-26 Thread Mohammad Islam
I got a similar problem.I'm not sure if your problem is already resolved.
For the record, I solved this type of error by calling 
sc..setMaster(yarn-cluster);  If you find the solution, please let us know.
Regards,Mohammad




 On Friday, March 6, 2015 2:47 PM, nitinkak001 nitinkak...@gmail.com 
wrote:
   

 I am trying to run a Hive query from Spark using HiveContext. Here is the
code

/ val conf = new SparkConf().setAppName(HiveSparkIntegrationTest)
    
  
    conf.set(spark.executor.extraClassPath,
/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib);
    conf.set(spark.driver.extraClassPath,
/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib);
    conf.set(spark.yarn.am.waitTime, 30L)
    
    val sc = new SparkContext(conf)

    val sqlContext = new HiveContext(sc)

    def inputRDD = sqlContext.sql(describe
spark_poc.src_digital_profile_user);

    inputRDD.collect().foreach { println }
    
    println(inputRDD.schema.getClass.getName)
/

Getting this exception. Any clues? The weird part is if I try to do the same
thing but in Java instead of Scala, it runs fine.

/Exception in thread Driver java.lang.NullPointerException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162)
15/03/06 17:39:32 ERROR yarn.ApplicationMaster: SparkContext did not
initialize after waiting for 1 ms. Please check earlier log output for
errors. Failing the application.
Exception in thread main java.lang.NullPointerException
    at
org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:218)
    at
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:110)
    at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:434)
    at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53)
    at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
    at
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:433)
    at
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
15/03/06 17:39:32 INFO yarn.ApplicationMaster: AppMaster received a signal./



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-test-Spark-Context-did-not-initialize-after-waiting-1ms-tp21953.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



  

Passing Java Options to Spark AM launching

2014-12-01 Thread Mohammad Islam
Hi,How to pass the Java options (such as -XX:MaxMetaspaceSize=100M) when 
lunching AM or task containers?
This is related to running Spark on Yarn (Hadoop 2.3.0). In Map-reduce case, 
setting the property such as 
mapreduce.map.java.opts would do the work.
Any help would be highly appreciated.
Regards,Mohammad



 

Re: Passing Java Options to Spark AM launching

2014-12-01 Thread Mohammad Islam
Thanks Tobias for the answer.Does it work for driver as well?
Regards,Mohammad 

 On Monday, December 1, 2014 5:30 PM, Tobias Pfeiffer t...@preferred.jp 
wrote:
   

 Hi,
have a look at the documentation for spark.driver.extraJavaOptions (which seems 
to have disappeared since I looked it up last week) and 
spark.executor.extraJavaOptions at 
http://spark.apache.org/docs/latest/configuration.html#runtime-environment.
Tobias