Hive version with Spark

2015-11-18 Thread Sofia
Hello

After various failed tries to use my Hive (1.2.1) with my Spark (Spark 1.4.1 
built for Hadoop 2.2.0) I decided to try to build again Spark with Hive.
I would like to know what is the latest Hive version that can be used to build 
Spark at this point.

When downloading Spark 1.5 source and trying:

mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-1.2.1 
-Phive-thriftserver  -DskipTests clean package

I get :

The requested profile "hive-1.2.1" could not be activated because it does not 
exist.

Thank you
Sofia

Re: hive transaction strange behaviour

2015-11-18 Thread Eugene Koifman
can you send ls -l on the partition where you expect a base and don't see it?

From: Sanjeev Verma 
>
Reply-To: "user@hive.apache.org" 
>
Date: Tuesday, November 17, 2015 at 10:27 PM
To: "user@hive.apache.org" 
>
Cc: "d...@hive.apache.org" 
>
Subject: Re: hive transaction strange behaviour

Any help will be much appreciated.Thanks

On Tue, Nov 17, 2015 at 2:39 PM, Sanjeev Verma 
> wrote:
Thank Elliot, Eugene
I am able to see the Base file created in one of the partition, seems the 
Compactor kicked in and created it but it has not created base files in rest of 
the partition where delta files still exists.why compactor has not picked the 
other partition, when and how these partition will be picked up for compaction.

Thanks

On Sat, Nov 14, 2015 at 11:01 PM, Eugene Koifman 
> wrote:
When Compaction process runs, it will create base directory.
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration

at a minimum you need 
hive.compactor.initiator.on=true
 and 
hive.compactor.worker.threads>0

Also, see 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionCompact
 on how to trigger compaction manually.

Eugene

From: Sanjeev Verma 
>
Reply-To: "user@hive.apache.org" 
>
Date: Thursday, November 12, 2015 at 11:41 PM
To: "user@hive.apache.org" 
>, 
"d...@hive.apache.org" 
>
Subject: hive transaction strange behaviour

I have enable the hive transaction and able to see the delta files created for 
some of the partition but i dont not see any base file created 
yet.it seems strange to me seeing so many delta files without 
any base file.
Could somebody let me know when Base file created.

Thanks




Building Spark to use for Hive on Spark

2015-11-18 Thread Udit Mehta
Hi,

I am planning to test out the Hive on Spark functionality provided by the
newer versions of Hive. I wanted to know  why is it necessary to remove the
Hive jars from the Spark build as mentioned on this this page.

This would require me to have 2 spark builds, one with the Hive jars and
one without.

Any help is appreciated,
Udit


Re: Hive version with Spark

2015-11-18 Thread Udit Mehta
As per this link :
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started,
you need to build Spark without Hive.

On Wed, Nov 18, 2015 at 8:50 AM, Sofia  wrote:

> Hello
>
> After various failed tries to use my Hive (1.2.1) with my Spark (Spark
> 1.4.1 built for Hadoop 2.2.0) I decided to try to build again Spark with
> Hive.
> I would like to know what is the latest Hive version that can be used to
> build Spark at this point.
>
> When downloading Spark 1.5 source and trying:
>
> *mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-1.2.1
> -Phive-thriftserver  -DskipTests clean package*
>
> I get :
>
> *The requested profile "hive-1.2.1" could not be activated because it does
> not exist.*
>
> Thank you
> Sofia
>


Bulk load in Hive transactions backed table

2015-11-18 Thread Jagat Singh
Hi,

Is it possible to do bulk load using files in hive table backed by
transactions instead of update statements.

Thanks


Re: Bulk load in Hive transactions backed table

2015-11-18 Thread Elliot West
Are you loading new data (inserts) or mutating existing data
(update/delete) or both? And by 'transactions' are you referring to Hive
ACID transactional tables? If so:

For new data, I think you should be able to use:

INSERT INTO transactional_table ... FROM table_over_file_to_be_loaded


For updates, deletes, and inserts, the Mutation API that allows you to bulk
mutate large volumes of records in a single transaction, however it's a
Java API and you'd need to implement a job to invoke it:
http://htmlpreview.github.io/?https://github.com/apache/hive/blob/master/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/package.html

Also, there is a proposed HQL MERGE command that would allow you to do
this, but it has not been implemented as yet:
https://issues.apache.org/jira/browse/HIVE-10924

Elliot.


On 18 November 2015 at 10:57, Jagat Singh  wrote:

> Hi,
>
> Is it possible to do bulk load using files in hive table backed by
> transactions instead of update statements.
>
> Thanks
>


Hiveserver2 does not respond to connection requests

2015-11-18 Thread Sakib Shaikh
Hi I've been trying to query hive through python and I am trying to connect
to hiveserver2. It is running but it never responds to any request. I tried
changing the hive-site.xml settings to get hiveserver to work but now any
request I send from python gives back errors. This is better than what I
had before which would result in the program hanging forever.

I have a question on stack overflow here with more details.

Please help me get hiveserver working. Any help is much appreciated!

http://stackoverflow.com/questions/33792550/hiveserver2-gives-no-response-on-port-1


Re: Building Spark to use for Hive on Spark

2015-11-18 Thread Gopal Vijayaraghavan


> I wanted to know  why is it necessary to remove the Hive jars from the
>Spark build as mentioned on this

Because SparkSQL was originally based on Hive & still uses Hive AST to
parse SQL.

The org.apache.spark.sql.hive package contains the parser which has
hard-references to the hive's internal AST, which is unfortunately
auto-generated code (HiveParser.TOK_TABNAME etc).

Everytime Hive makes a release, those constants change in value and that
is private API because of the lack of backwards-compat, which is violated
by SparkSQL.

So Hive-on-Spark forces mismatched versions of Hive classes, because it's
a circular dependency of Hive(v1) -> Spark -> Hive(v2) due to the basic
laws of causality.

Spark cannot depend on a version of Hive that is unreleased and
Hive-on-Spark release cannot depend on a version of Spark that is
unreleased.

Cheers,
Gopal