Hi Yin, i’m using spark-hive dependency and tests for my app work for
spark1.3.1.
seems it’s something with hive & sbt. Running from spark-shell next
statement works, but from sbt console in rc3 i get next error:
scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
15/05/29 16:31:06 WARN ObjectStore: Version information not found in
metastore. hive.metastore.schema.verification is not enabled so
recording the schema version 0.13.1aa
sqlContext: org.apache.spark.sql.hive.HiveContext =
org.apache.spark.sql.hive.HiveContext@177ac9f4
scala> val data = sqlContext.read.parquet("caches/-1525448137")
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for
further details.
java.lang.IllegalArgumentException: Unable to locate hive jars to
connect to metastore using classloader
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader. Please set
spark.sql.hive.metastore.jars
at
org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:206)
at
org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:175)
at
org.apache.spark.sql.hive.HiveContext$$anon$2.<init>(HiveContext.scala:367)
at
org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:367)
at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:366)
at
org.apache.spark.sql.hive.HiveContext$$anon$1.<init>(HiveContext.scala:379)
at
org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:379)
at
org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:378)
at
org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:901)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:134)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
at
org.apache.spark.sql.SQLContext.baseRelationToDataFrame(SQLContext.scala:419)
at
org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:264)
Thanks,
Peter Rudenko
On 2015-05-29 07:08, Yin Huai wrote:
Justin,
If you are creating multiple HiveContexts in tests, you need to assign
a temporary metastore location for every HiveContext (like what we do
at here
<https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L527-L543>).
Otherwise, they all try to connect to the metastore in the current dir
(look at metastore_db).
Peter,
Do you also have the same use case as Justin (creating multiple
HiveContexts in tests)? Can you explain what you meant by "all tests"?
I am probably missing some context at here.
Thanks,
Yin
On Thu, May 28, 2015 at 11:28 AM, Peter Rudenko
<petro.rude...@gmail.com <mailto:petro.rude...@gmail.com>> wrote:
Also have the same issue - all tests fail because of HiveContext /
derby lock.
|Cause: javax.jdo.JDOFatalDataStoreException: Unable to open a test
connection to the given database. JDBC url =
jdbc:derby:;databaseName=metastore_db;create=true, username = APP.
Terminating connection pool (set lazyInit to true if you expect to
start your database after your app). Original Exception: ------
[info] java.sql.SQLException: Failed to start database
'metastore_db' with class loader
org.apache.spark.sql.hive.client.IsolatedClientLoader$anon$1@8066e0e,
see the next exception for details. |
Also is there build for hadoop2.6? Don’t see it here:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-bin/
<http://people.apache.org/%7Epwendell/spark-releases/spark-1.4.0-rc2-bin/>
Thanks,
Peter Rudenko
On 2015-05-22 22:56, Justin Uang wrote:
I'm working on one of the Palantir teams using Spark, and here is
our feedback:
We have encountered three issues when upgrading to spark 1.4.0.
I'm not sure they qualify as a -1, as they come from using
non-public APIs and multiple spark contexts for the purposes of
testing, but I do want to bring them up for awareness =)
1. Our UDT was serializing to a StringType, but now strings are
represented internally as UTF8String, so we had to change our
UDT to use UTF8String.apply() and UTF8String.toString() to
convert back to String.
2. createDataFrame when using UDTs used to accept things in the
serialized catalyst form. Now, they're supposed to be in the
UDT java class form (I think this change would've affected us
in 1.3.1 already, since we were in 1.3.0)
3. derby database lifecycle management issue with HiveContext.
We have been using a SparkContextResource JUnit Rule that we
wrote, and it sets up then tears down a SparkContext and
HiveContext between unit test runs within the same process
(possibly the same thread as well). Multiple contexts are not
being used at once. It used to work in 1.3.0, but now when we
try to create the HiveContext for the second unit test, then
it complains with the following exception. I have a feeling
it might have something to do with the Hive object being
thread local, and us not explicitly closing the HiveContext
and everything it holds. The full stack trace is here:
https://gist.github.com/justinuang/0403d49cdeedf91727cd
Caused by: java.sql.SQLException: Failed to start database 'metastore_db'
with class loader
org.apache.spark.sql.hive.client.IsolatedClientLoader$anon$1@5dea2446, see the
next exception for details.
at
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
On Wed, May 20, 2015 at 10:35 AM Imran Rashid
<iras...@cloudera.com <mailto:iras...@cloudera.com>> wrote:
-1
discovered I accidentally removed master & worker json
endpoints, will restore
https://issues.apache.org/jira/browse/SPARK-7760
On Tue, May 19, 2015 at 11:10 AM, Patrick Wendell
<pwend...@gmail.com <mailto:pwend...@gmail.com>> wrote:
Please vote on releasing the following candidate as
Apache Spark version 1.4.0!
The tag to be voted on is v1.4.0-rc1 (commit 777a081):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e
The release files, including signatures, digests, etc.
can be found at:
http://people.apache.org/~pwendell/spark-1.4.0-rc1/
<http://people.apache.org/%7Epwendell/spark-1.4.0-rc1/>
Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc
The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1092/
The documentation corresponding to this release can be
found at:
http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/
<http://people.apache.org/%7Epwendell/spark-1.4.0-rc1-docs/>
Please vote on releasing this package as Apache Spark 1.4.0!
The vote is open until Friday, May 22, at 17:03 UTC and
passes
if a majority of at least 3 +1 PMC votes are cast.
[ ] +1 Release this package as Apache Spark 1.4.0
[ ] -1 Do not release this package because ...
To learn more about Apache Spark, please see
http://spark.apache.org/
== How can I help test this release? ==
If you are a Spark user, you can help us test this release by
taking a Spark 1.3 workload and running on this release
candidate,
then reporting any regressions.
== What justifies a -1 vote for this release? ==
This vote is happening towards the end of the 1.4 QA period,
so -1 votes should only occur for significant regressions
from 1.3.1.
Bugs already present in 1.3.X, minor regressions, or bugs
related
to new features will not block this release.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
<mailto:dev-unsubscr...@spark.apache.org>
For additional commands, e-mail:
dev-h...@spark.apache.org <mailto:dev-h...@spark.apache.org>