Re: Spark-shell throws Hive error when SQLContext.parquetFile, v1.3
In addition to Cheng's comment -- I found the similar problem when hive-site.xml is not in the class path. A proper stack trace can pinpoint the problem. In the mean time, you can add it into your environment through HADOOP_CLASSPATH. (export HADOOP_CONF_DIR=/etc/hive/conf/) See more at http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_rn_spark_ki.html and look for "Spark not automatically picking up hive-site.xml". On Thursday, September 10, 2015 5:01 AM, Cheng Lianwrote: If you don't need to interact with Hive, you may compile Spark without using the -Phive flag to eliminate Hive dependencies. In this way, the sqlContext instance in Spark shell will be of type SQLContext instead of HiveContext. The reason behind the Hive metastore error is probably due to Hive misconfiguration. Cheng On 9/10/15 6:02 PM, Petr Novak wrote: > Hello, > > sqlContext.parquetFile(dir) > > throws exception " Unable to instantiate > org.apache.hadoop.hive.metastore.HiveMetaStoreClient" > > The strange thing is that on the second attempt to open the file it is > successful: > > try { > sqlContext.parquetFile(dir) > } catch { > case e: Exception => sqlContext.parquetFile(dir) > } > > What should I do to make my script to run flawlessly in spark-shell > when opening parquetFiles. It is probably missing some dependency. Or > how should I write the code because this double attempt is awfull and > I don't need HiveMetaStoreClient, I just need to open parquet file. > > Many thanks for any idea, > Petr > > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Creating Parquet external table using HiveContext API
Thanks a lot Michael for giving a solution. If I want to provide my own schema, can I do that? On Thursday, September 10, 2015 11:05 AM, Michael Armbrust <mich...@databricks.com> wrote: Easiest is to just use SQL: hiveContext.sql("CREATE TABLE USING parquet OPTIONS (path '')") When you specify the path its automatically created as an external table. The schema will be discovered. On Wed, Sep 9, 2015 at 9:33 PM, Mohammad Islam <misla...@yahoo.com.invalid> wrote: Hi,I want to create an external hive table using HiveContext. I have the following :1. full path/location of parquet data directory2. name of the new table3. I can get the schema as well. What API will be the best (for 1,3.x or 1.4.x)? I can see 6 createExternalTable() APIs but not sure which one will be the best.I didn't find any good documentation in source code or Java doc about the parameters of the APIs (i.e path, source, options etc); Any help will be appreciated. Regards,Mohammad
Creating Parquet external table using HiveContext API
Hi,I want to create an external hive table using HiveContext. I have the following :1. full path/location of parquet data directory2. name of the new table3. I can get the schema as well. What API will be the best (for 1,3.x or 1.4.x)? I can see 6 createExternalTable() APIs but not sure which one will be the best.I didn't find any good documentation in source code or Java doc about the parameters of the APIs (i.e path, source, options etc); Any help will be appreciated. Regards,Mohammad
Re: HiveContext test, Spark Context did not initialize after waiting 10000ms
I got a similar problem.I'm not sure if your problem is already resolved. For the record, I solved this type of error by calling sc..setMaster(yarn-cluster); If you find the solution, please let us know. Regards,Mohammad On Friday, March 6, 2015 2:47 PM, nitinkak001 nitinkak...@gmail.com wrote: I am trying to run a Hive query from Spark using HiveContext. Here is the code / val conf = new SparkConf().setAppName(HiveSparkIntegrationTest) conf.set(spark.executor.extraClassPath, /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib); conf.set(spark.driver.extraClassPath, /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib); conf.set(spark.yarn.am.waitTime, 30L) val sc = new SparkContext(conf) val sqlContext = new HiveContext(sc) def inputRDD = sqlContext.sql(describe spark_poc.src_digital_profile_user); inputRDD.collect().foreach { println } println(inputRDD.schema.getClass.getName) / Getting this exception. Any clues? The weird part is if I try to do the same thing but in Java instead of Scala, it runs fine. /Exception in thread Driver java.lang.NullPointerException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162) 15/03/06 17:39:32 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 1 ms. Please check earlier log output for errors. Failing the application. Exception in thread main java.lang.NullPointerException at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:218) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:110) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:434) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:433) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) 15/03/06 17:39:32 INFO yarn.ApplicationMaster: AppMaster received a signal./ -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-test-Spark-Context-did-not-initialize-after-waiting-1ms-tp21953.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Passing Java Options to Spark AM launching
Hi,How to pass the Java options (such as -XX:MaxMetaspaceSize=100M) when lunching AM or task containers? This is related to running Spark on Yarn (Hadoop 2.3.0). In Map-reduce case, setting the property such as mapreduce.map.java.opts would do the work. Any help would be highly appreciated. Regards,Mohammad
Re: Passing Java Options to Spark AM launching
Thanks Tobias for the answer.Does it work for driver as well? Regards,Mohammad On Monday, December 1, 2014 5:30 PM, Tobias Pfeiffer t...@preferred.jp wrote: Hi, have a look at the documentation for spark.driver.extraJavaOptions (which seems to have disappeared since I looked it up last week) and spark.executor.extraJavaOptions at http://spark.apache.org/docs/latest/configuration.html#runtime-environment. Tobias