from:"Andrew Lee"

Spark 1.6: Why Including hive-jdbc in assembly when -Phive-provided is set?

2016-02-03 Thread Andrew Lee

Hi All,


I have a question regarding the hive-jdbc library that is being included in the 
assembly JAR.


Build command.

mvn -U -X -Phadoop-2.6 -Phadoop-provided -Phive-provided -Pyarn 
-Phive-thriftserver -Psparkr -DskipTests install


In the pom.xml file, the scope for hive JARs are set to 'compile', however, 
there is one entry

https://github.com/apache/spark/blob/branch-1.6/pom.xml#L1414

[https://avatars1.githubusercontent.com/u/47359?v=3=400]

apache/spark
github.com
spark - Mirror of Apache Spark




that includes it again.


The assembly JAR shows the following content with 'jar tf'.


org/apache/hive/
org/apache/hive/jdbc/
org/apache/hive/jdbc/HiveDatabaseMetaData.class
org/apache/hive/jdbc/ZooKeeperHiveClientHelper.class
org/apache/hive/jdbc/ZooKeeperHiveClientHelper$DummyWatcher.class
org/apache/hive/jdbc/HiveQueryResultSet$Builder.class
org/apache/hive/jdbc/HiveResultSetMetaData.class
org/apache/hive/jdbc/HivePreparedStatement.class
org/apache/hive/jdbc/HiveStatement$1.class
org/apache/hive/jdbc/JdbcUriParseException.class
org/apache/hive/jdbc/HiveDataSource.class
org/apache/hive/jdbc/HttpBasicAuthInterceptor.class
org/apache/hive/jdbc/JdbcColumn.class
org/apache/hive/jdbc/Utils$JdbcConnectionParams.class
org/apache/hive/jdbc/HiveMetaDataResultSet.class
org/apache/hive/jdbc/HiveDriver.class
org/apache/hive/jdbc/JdbcTable.class
org/apache/hive/jdbc/HiveBaseResultSet.class
org/apache/hive/jdbc/HiveDatabaseMetaData$GetTablesComparator.class
org/apache/hive/jdbc/HiveDatabaseMetaData$1.class
org/apache/hive/jdbc/HiveStatement.class
org/apache/hive/jdbc/ZooKeeperHiveClientException.class
org/apache/hive/jdbc/HiveQueryResultSet$1.class
org/apache/hive/jdbc/Utils.class
org/apache/hive/jdbc/HiveConnection$1.class
org/apache/hive/jdbc/JdbcColumn$1.class
org/apache/hive/jdbc/HiveBaseResultSet$1.class
org/apache/hive/jdbc/HttpKerberosRequestInterceptor.class
org/apache/hive/jdbc/JdbcColumnAttributes.class
org/apache/hive/jdbc/HiveCallableStatement.class
org/apache/hive/jdbc/HiveDatabaseMetaData$GetColumnsComparator.class
org/apache/hive/jdbc/ClosedOrCancelledStatementException.class
org/apache/hive/jdbc/HiveQueryResultSet.class
org/apache/hive/jdbc/HttpRequestInterceptorBase.class
org/apache/hive/jdbc/HiveConnection.class
org/apache/hive/service/
org/apache/hive/service/server/
org/apache/hive/service/server/HiveServerServerOptionsProcessor.class


Would like to know why this is there and can we remove that? and link the 
hive-jdbc during runtime?

Re: Spark 1.4.2 release and votes conversation?

2015-11-16 Thread Andrew Lee

I did, and it passes all of our test case, so I'm wondering what did I miss. I 
know there is the memory leak spill JIRA SPARK-11293, but not sure if that will 
go in 1.4.2 or 1.4.3, etc.

From: Reynold Xin <r...@databricks.com>
Sent: Friday, November 13, 2015 1:31 PM
To: Andrew Lee
Cc: dev@spark.apache.org
Subject: Re: Spark 1.4.2 release and votes conversation?

In the interim, you can just build it off branch-1.4 if you want.

On Fri, Nov 13, 2015 at 1:30 PM, Reynold Xin 
<r...@databricks.com<mailto:r...@databricks.com>> wrote:
I actually tried to build a binary for 1.4.2 and wanted to start voting, but 
there was an issue with the release script that failed the jenkins job. Would 
be great to kick off a 1.4.2 release.

On Fri, Nov 13, 2015 at 1:00 PM, Andrew Lee 
<alee...@hotmail.com<mailto:alee...@hotmail.com>> wrote:

Hi All,

I'm wondering if Spark 1.4.2 had been voted by any chance or if I have 
overlooked and we are targeting 1.4.3?

By looking at the JIRA

https://issues.apache.org/jira/browse/SPARK/fixforversion/12332833/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-summary-panel

All issues were resolved and no blockers. Anyone knows what happened to this 
release?

or was there any recommendation to skip that and ask users to use Spark 1.5.2 
instead?

Spark 1.4.2 release and votes conversation?

2015-11-13 Thread Andrew Lee

Hi All,


I'm wondering if Spark 1.4.2 had been voted by any chance or if I have 
overlooked and we are targeting 1.4.3?


By looking at the JIRA

https://issues.apache.org/jira/browse/SPARK/fixforversion/12332833/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-summary-panel


All issues were resolved and no blockers. Anyone knows what happened to this 
release?



or was there any recommendation to skip that and ask users to use Spark 1.5.2 
instead?

RE: Build Spark 1.2.0-rc1 encounter exceptions when running HiveContext - Caused by: java.lang.ClassNotFoundException: com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy

2014-12-29 Thread Andrew Lee

Hi Patrick,
I manually hardcoded the hive version to 0.13.1a and it works. It turns out 
that for some reason, 0.13.1 is being picked up instead of the 0.13.1a version 
from maven.
So my solution was:hardcode the hive.version to 0.13.1a in my case since I am 
building it against hive 0.13 only, so the pom.xml was hardcoded with that 
version string, and the final JAR is working now with hive-exec 0.13.1a embed.
Possible Reason why it didn't work?I suspect our internal environment is 
picking up 0.13.1 since we do use our own maven repo as a proxy and caching.  
0.13.1a did appear in our own repo and it got replicated from the maven central 
repo, but during the build process, maven picked up 0.13.1 instead of 0.13.1a.

 Date: Wed, 10 Dec 2014 12:23:08 -0800
 Subject: Re: Build Spark 1.2.0-rc1 encounter exceptions when running 
 HiveContext - Caused by: java.lang.ClassNotFoundException: 
 com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy
 From: pwend...@gmail.com
 To: alee...@hotmail.com
 CC: dev@spark.apache.org
 
 Hi Andrew,
 
 It looks like somehow you are including jars from the upstream Apache
 Hive 0.13 project on your classpath. For Spark 1.2 Hive 0.13 support,
 we had to modify Hive to use a different version of Kryo that was
 compatible with Spark's Kryo version.
 
 https://github.com/pwendell/hive/commit/5b582f242946312e353cfce92fc3f3fa472aedf3
 
 I would look through the actual classpath and make sure you aren't
 including your own hive-exec jar somehow.
 
 - Patrick
 
 On Wed, Dec 10, 2014 at 9:48 AM, Andrew Lee alee...@hotmail.com wrote:
  Apologize for the format, somehow it got messed up and linefeed were 
  removed. Here's a reformatted version.
  Hi All,
  I tried to include necessary libraries in SPARK_CLASSPATH in spark-env.sh 
  to include auxiliaries JARs and datanucleus*.jars from Hive, however, when 
  I run HiveContext, it gives me the following error:
 
  Caused by: java.lang.ClassNotFoundException: 
  com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy
 
  I have checked the JARs with (jar tf), looks like this is already included 
  (shaded) in the assembly JAR (spark-assembly-1.2.0-hadoop2.4.1.jar) which 
  is configured in the System classpath already. I couldn't figure out what 
  is going on with the shading on the esotericsoftware JARs here.  Any help 
  is appreciated.
 
 
  How to reproduce the problem?
  Run the following 3 statements in spark-shell ( This is how I launched my 
  spark-shell. cd /opt/spark; ./bin/spark-shell --master yarn --deploy-mode 
  client --queue research --driver-memory 1024M)
 
  import org.apache.spark.SparkContext
  val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
  hiveContext.hql(CREATE TABLE IF NOT EXISTS spark_hive_test_table (key INT, 
  value STRING))
 
 
 
  A reference of my environment.
  Apache Hadoop 2.4.1
  Apache Hive 0.13.1
  Apache Spark branch-1.2 (installed under /opt/spark/, and config under 
  /etc/spark/)
  Maven build command:
 
  mvn -U -X -Phadoop-2.4 -Pyarn -Phive -Phive-0.13.1 -Dhadoop.version=2.4.1 
  -Dyarn.version=2.4.1 -Dhive.version=0.13.1 -DskipTests install
 
  Source Code commit label: eb4d457a870f7a281dc0267db72715cd00245e82
 
  My spark-env.sh have the following contents when I executed spark-shell:
  HADOOP_HOME=/opt/hadoop/
  HIVE_HOME=/opt/hive/
  HADOOP_CONF_DIR=/etc/hadoop/
  YARN_CONF_DIR=/etc/hadoop/
  HIVE_CONF_DIR=/etc/hive/
  HADOOP_SNAPPY_JAR=$(find $HADOOP_HOME/share/hadoop/common/lib/ -type f 
  -name snappy-java-*.jar)
  HADOOP_LZO_JAR=$(find $HADOOP_HOME/share/hadoop/common/lib/ -type f -name 
  hadoop-lzo-*.jar)
  SPARK_YARN_DIST_FILES=/user/spark/libs/spark-assembly-1.2.0-hadoop2.4.1.jar
  export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$HADOOP_HOME/lib/native
  export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native
  export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:$HADOOP_HOME/lib/native
  export 
  SPARK_CLASSPATH=$SPARK_CLASSPATH:$HADOOP_SNAPPY_JAR:$HADOOP_LZO_JAR:$HIVE_CONF_DIR:/opt/hive/lib/datanucleus-api-jdo-3.2.6.jar:/opt/hive/lib/datanucleus-core-3.2.10.jar:/opt/hive/lib/datanucleus-rdbms-3.2.9.jar
 
 
  Here's what I see from my stack trace.
  warning: there were 1 deprecation warning(s); re-run with -deprecation for 
  details
  Hive history 
  file=/home/hive/log/alti-test-01/hive_job_log_b5db9539-4736-44b3-a601-04fa77cb6730_1220828461.txt
  java.lang.NoClassDefFoundError: 
  com/esotericsoftware/shaded/org/objenesis/strategy/InstantiatorStrategy
at 
  org.apache.hadoop.hive.ql.exec.Utilities.clinit(Utilities.java:925)
at 
  org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.validate(SemanticAnalyzer.java:9718)
at 
  org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.validate(SemanticAnalyzer.java:9712)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:434)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
at org.apache.hadoop.hive.ql.Driver.compileInternal

Build Spark 1.2.0-rc1 encounter exceptions when running HiveContext - Caused by: java.lang.ClassNotFoundException: com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy

2014-12-10 Thread Andrew Lee

Hi All,
I tried to include necessary libraries in SPARK_CLASSPATH in spark-env.sh to 
include auxiliaries JARs and datanucleus*.jars from Hive, however, when I run 
HiveContext, it gives me the following error:
Caused by: java.lang.ClassNotFoundException: 
com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy
I have checked the JARs with (jar tf), looks like this is already included 
(shaded) in the assembly JAR (spark-assembly-1.2.0-hadoop2.4.1.jar) which is 
configured in the System classpath already. I couldn't figure out what is going 
on with the shading on the esotericsoftware JARs here. Any help is appreciated.
How to reproduce the problem?Run the following 3 statements in spark-shell ( 
This is how I launched my spark-shell. cd /opt/spark; ./bin/spark-shell 
--master yarn --deploy-mode client --queue research --driver-memory 1024M)
import org.apache.spark.SparkContextval hiveContext = new 
org.apache.spark.sql.hive.HiveContext(sc)hiveContext.hql(CREATE TABLE IF NOT 
EXISTS spark_hive_test_table (key INT, value STRING))

A reference of my environment.Apache Hadoop 2.4.1Apache Hive 0.13.1Apache Spark 
branch-1.2 (installed under /opt/spark/, and config under /etc/spark/)Maven 
build command:







mvn -U -X -Phadoop-2.4 -Pyarn -Phive -Phive-0.13.1 -Dhadoop.version=2.4.1 
-Dyarn.version=2.4.1 -Dhive.version=0.13.1 -DskipTests install
Source Code commit label: eb4d457a870f7a281dc0267db72715cd00245e82
















My spark-env.sh have the following contents when I executed spark-shell:







HADOOP_HOME=/opt/hadoop/
HIVE_HOME=/opt/hive/
HADOOP_CONF_DIR=/etc/hadoop/
YARN_CONF_DIR=/etc/hadoop/
HIVE_CONF_DIR=/etc/hive/
HADOOP_SNAPPY_JAR=$(find $HADOOP_HOME/share/hadoop/common/lib/ -type f -name 
snappy-java-*.jar)
HADOOP_LZO_JAR=$(find $HADOOP_HOME/share/hadoop/common/lib/ -type f -name 
hadoop-lzo-*.jar)








SPARK_YARN_DIST_FILES=/user/spark/libs/spark-assembly-1.2.0-hadoop2.4.1.jar
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$HADOOP_HOME/lib/native
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native
export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:$HADOOP_HOME/lib/native
export 
SPARK_CLASSPATH=$SPARK_CLASSPATH:$HADOOP_SNAPPY_JAR:$HADOOP_LZO_JAR:$HIVE_CONF_DIR:/opt/hive/lib/datanucleus-api-jdo-3.2.6.jar:/opt/hive/lib/datanucleus-core-3.2.10.jar:/opt/hive/lib/datanucleus-rdbms-3.2.9.jar
Here's what I see from my stack trace.
warning: there were 1 deprecation warning(s); re-run with -deprecation for 
details
Hive history 
file=/home/hive/log/alti-test-01/hive_job_log_b5db9539-4736-44b3-a601-04fa77cb6730_1220828461.txt
java.lang.NoClassDefFoundError: 
com/esotericsoftware/shaded/org/objenesis/strategy/InstantiatorStrategy
at org.apache.hadoop.hive.ql.exec.Utilities.clinit(Utilities.java:925)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.validate(SemanticAnalyzer.java:9718)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.validate(SemanticAnalyzer.java:9712)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:434)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:305)
at 
org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276)
at 
org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35)
at 
org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35)
at 
org.apache.spark.sql.execution.Command$class.execute(commands.scala:46)
at 
org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:30)
at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)
at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)
at 
org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
at org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:108)
at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:102)
at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:106)
at $iwC$$iwC$$iwC$$iwC.init(console:16)
at $iwC$$iwC$$iwC.init(console:21)
at $iwC$$iwC.init(console:23)
at $iwC.init(console:25)
at init(console:27)
at .init(console:31)
at .clinit(console)
at .init(console:7)
at .clinit(console)
at $print(console)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at

RE: Build Spark 1.2.0-rc1 encounter exceptions when running HiveContext - Caused by: java.lang.ClassNotFoundException: com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy

2014-12-10 Thread Andrew Lee

Apologize for the format, somehow it got messed up and linefeed were removed. 
Here's a reformatted version.
Hi All,
I tried to include necessary libraries in SPARK_CLASSPATH in spark-env.sh to 
include auxiliaries JARs and datanucleus*.jars from Hive, however, when I run 
HiveContext, it gives me the following error:

Caused by: java.lang.ClassNotFoundException: 
com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy

I have checked the JARs with (jar tf), looks like this is already included 
(shaded) in the assembly JAR (spark-assembly-1.2.0-hadoop2.4.1.jar) which is 
configured in the System classpath already. I couldn't figure out what is going 
on with the shading on the esotericsoftware JARs here.  Any help is appreciated.


How to reproduce the problem?
Run the following 3 statements in spark-shell ( This is how I launched my 
spark-shell. cd /opt/spark; ./bin/spark-shell --master yarn --deploy-mode 
client --queue research --driver-memory 1024M)

import org.apache.spark.SparkContext
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
hiveContext.hql(CREATE TABLE IF NOT EXISTS spark_hive_test_table (key INT, 
value STRING))



A reference of my environment.
Apache Hadoop 2.4.1
Apache Hive 0.13.1
Apache Spark branch-1.2 (installed under /opt/spark/, and config under 
/etc/spark/)
Maven build command:

mvn -U -X -Phadoop-2.4 -Pyarn -Phive -Phive-0.13.1 -Dhadoop.version=2.4.1 
-Dyarn.version=2.4.1 -Dhive.version=0.13.1 -DskipTests install

Source Code commit label: eb4d457a870f7a281dc0267db72715cd00245e82

My spark-env.sh have the following contents when I executed spark-shell:
 HADOOP_HOME=/opt/hadoop/
 HIVE_HOME=/opt/hive/
 HADOOP_CONF_DIR=/etc/hadoop/
 YARN_CONF_DIR=/etc/hadoop/
 HIVE_CONF_DIR=/etc/hive/
 HADOOP_SNAPPY_JAR=$(find $HADOOP_HOME/share/hadoop/common/lib/ -type f -name 
 snappy-java-*.jar)
 HADOOP_LZO_JAR=$(find $HADOOP_HOME/share/hadoop/common/lib/ -type f -name 
 hadoop-lzo-*.jar)
 SPARK_YARN_DIST_FILES=/user/spark/libs/spark-assembly-1.2.0-hadoop2.4.1.jar
 export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$HADOOP_HOME/lib/native
 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native
 export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:$HADOOP_HOME/lib/native
 export 
 SPARK_CLASSPATH=$SPARK_CLASSPATH:$HADOOP_SNAPPY_JAR:$HADOOP_LZO_JAR:$HIVE_CONF_DIR:/opt/hive/lib/datanucleus-api-jdo-3.2.6.jar:/opt/hive/lib/datanucleus-core-3.2.10.jar:/opt/hive/lib/datanucleus-rdbms-3.2.9.jar


 Here's what I see from my stack trace.
 warning: there were 1 deprecation warning(s); re-run with -deprecation for 
 details
 Hive history 
 file=/home/hive/log/alti-test-01/hive_job_log_b5db9539-4736-44b3-a601-04fa77cb6730_1220828461.txt
 java.lang.NoClassDefFoundError: 
 com/esotericsoftware/shaded/org/objenesis/strategy/InstantiatorStrategy
   at org.apache.hadoop.hive.ql.exec.Utilities.clinit(Utilities.java:925)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.validate(SemanticAnalyzer.java:9718)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.validate(SemanticAnalyzer.java:9712)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:434)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
   at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:305)
   at 
 org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276)
   at 
 org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35)
   at 
 org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35)
   at 
 org.apache.spark.sql.execution.Command$class.execute(commands.scala:46)
   at 
 org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:30)
   at 
 org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)
   at 
 org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)
   at 
 org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
   at org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:108)
   at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:102)
   at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:106)
   at $iwC$$iwC$$iwC$$iwC.init(console:16)
   at $iwC$$iwC$$iwC.init(console:21)
   at $iwC$$iwC.init(console:23)
   at $iwC.init(console:25)
   at init(console:27)
   at .init(console:31)
   at .clinit(console)
   at .init(console:7)
   at .clinit(console)
   at $print(console)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at

RE: Working Formula for Hive 0.13?

2014-08-25 Thread Andrew Lee

From my perspective, there're few benefits regarding Hive 0.13.1+. The 
following are the 4 major ones that I can see why people are asking to upgrade 
to Hive 0.13.1 recently.
1. Performance and bug fix, patches. (Usual case)
2. Native support for Parquet format, no need to provide custom JARs and SerDe 
like Hive 0.12. (Depends, driven by data format and queries)
3. Support of Tez engine which gives performance improvement in several use 
cases (Performance improvement)
4. Security enhancement in Hive 0.13.1 has improved a lot (Security concerns, 
ACLs, etc)
These are the major benefits I see to upgrade to Hive 0.13.1+ from Hive 0.12.0.
There may be others out there that I'm not aware of, but I do see it coming.
my 2 cents.
 From: mich...@databricks.com
 Date: Mon, 25 Aug 2014 13:08:42 -0700
 Subject: Re: Working Formula for Hive 0.13?
 To: wangf...@huawei.com
 CC: dev@spark.apache.org
 
 Thanks for working on this!  Its unclear at the moment exactly how we are
 going to handle this, since the end goal is to be compatible with as many
 versions of Hive as possible.  That said, I think it would be great to open
 a PR in this case.  Even if we don't merge it, thats a good way to get it
 on people's radar and have a discussion about the changes that are required.
 
 
 On Sun, Aug 24, 2014 at 7:11 PM, scwf wangf...@huawei.com wrote:
 
I have worked for a branch update the hive version to hive-0.13(by
  org.apache.hive)---https://github.com/scwf/spark/tree/hive-0.13
  I am wondering whether it's ok to make a PR now because hive-0.13 version
  is not compatible with hive-0.12 and here i used org.apache.hive.
 
 
 
  On 2014/7/29 8:22, Michael Armbrust wrote:
 
  A few things:
- When we upgrade to Hive 0.13.0, Patrick will likely republish the
  hive-exec jar just as we did for 0.12.0
- Since we have to tie into some pretty low level APIs it is
  unsurprising
  that the code doesn't just compile out of the box against 0.13.0
- ScalaReflection is for determining Schema from Scala classes, not
  reflection based bridge code.  Either way its unclear to if there is any
  reason to use reflection to support multiple versions, instead of just
  upgrading to Hive 0.13.0
 
  One question I have is, What is the goal of upgrading to hive 0.13.0?  Is
  it purely because you are having problems connecting to newer metastores?
Are there some features you are hoping for?  This will help me
  prioritize
  this effort.
 
  Michael
 
 
  On Mon, Jul 28, 2014 at 4:05 PM, Ted Yu yuzhih...@gmail.com wrote:
 
   I was looking for a class where reflection-related code should reside.
 
  I found this but don't think it is the proper class for bridging
  differences between hive 0.12 and 0.13.1:
 
  sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/
  ScalaReflection.scala
 
  Cheers
 
 
  On Mon, Jul 28, 2014 at 3:41 PM, Ted Yu yuzhih...@gmail.com wrote:
 
   After manually copying hive 0.13.1 jars to local maven repo, I got the
  following errors when building spark-hive_2.10 module :
 
  [ERROR]
 
   /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/
  sql/hive/HiveContext.scala:182:
 
  type mismatch;
found   : String
required: Array[String]
  [ERROR]   val proc: CommandProcessor =
  CommandProcessorFactory.get(tokens(0), hiveconf)
  [ERROR]
  ^
  [ERROR]
 
   /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/
  sql/hive/HiveMetastoreCatalog.scala:60:
 
  value getAllPartitionsForPruner is not a member of org.apache.
hadoop.hive.ql.metadata.Hive
  [ERROR] client.getAllPartitionsForPruner(table).toSeq
  [ERROR]^
  [ERROR]
 
   /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/
  sql/hive/HiveMetastoreCatalog.scala:267:
 
  overloaded method constructor TableDesc with alternatives:
 (x$1: Class[_ : org.apache.hadoop.mapred.InputFormat[_, _]],x$2:
  Class[_],x$3:
 
  java.util.Properties)org.apache.hadoop.hive.ql.plan.TableDesc
 
  and
 ()org.apache.hadoop.hive.ql.plan.TableDesc
cannot be applied to (Class[org.apache.hadoop.hive.
  serde2.Deserializer],
  Class[(some other)?0(in value tableDesc)(in value tableDesc)],
 
  Class[?0(in
 
  value tableDesc)(in   value tableDesc)], java.util.Properties)
  [ERROR]   val tableDesc = new TableDesc(
  [ERROR]   ^
  [WARNING] Class org.antlr.runtime.tree.CommonTree not found -
  continuing
  with a stub.
  [WARNING] Class org.antlr.runtime.Token not found - continuing with a
 
  stub.
 
  [WARNING] Class org.antlr.runtime.tree.Tree not found - continuing with
  a
  stub.
  [ERROR]
while compiling:
 
   /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/
  sql/hive/HiveQl.scala
 
   during phase: typer
library version: version 2.10.4
   compiler version: version 2.10.4
 
  The above shows incompatible changes between 0.12 and 0.13.1
  e.g. the first error corresponds to the following method
  in CommandProcessorFactory :
 public static

Spark 1.6: Why Including hive-jdbc in assembly when -Phive-provided is set?

Re: Spark 1.4.2 release and votes conversation?

Spark 1.4.2 release and votes conversation?

RE: Build Spark 1.2.0-rc1 encounter exceptions when running HiveContext - Caused by: java.lang.ClassNotFoundException: com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy

Build Spark 1.2.0-rc1 encounter exceptions when running HiveContext - Caused by: java.lang.ClassNotFoundException: com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy

RE: Build Spark 1.2.0-rc1 encounter exceptions when running HiveContext - Caused by: java.lang.ClassNotFoundException: com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy

RE: Working Formula for Hive 0.13?

7 matches

Site Navigation

Mail list logo

Footer information