Create table from ORC or Parquet file?

2015-12-02 Thread Alexander Pivovarov
Hi Everyone

Is it possible to create Hive table from ORC or Parquet file without
specifying field names and their types. ORC or Parquet files contain field
name and type information inside.

Alex


Re: Hive on spark table caching

2015-12-02 Thread Xuefu Zhang
Depending on the query, Hive on Spark does implicitly cache datasets (not
necessarily the input tables) for performance benefits. Such queries
include multi-insert, self-join, self-union, etc. However, no caching
happens across queries at this time, which may be improved in the future.

Thanks,
Xuefu

On Wed, Dec 2, 2015 at 3:00 PM, Udit Mehta  wrote:

> Hi,
>
> I have started using Hive on Spark recently and am exploring the benefits
> it offers. I was wondering if Hive on Spark has capabilities to cache table
> like Spark SQL. Or does it do any form of implicit caching in the long
> running job which it starts after running the first query?
>
> Thanks,
> Udit
>


RE: Hive on spark table caching

2015-12-02 Thread Mich Talebzadeh
OK

 

How did you build your Spark 1.3. Was that from the source code or pre-build 
for Hadoop 2.6 please?

 

The one I have 

 

1.Spark version 1.5.2

2.Hive version 1.2.1

3.Hadoop version 2.6

 

Does not work with Hive on Spark 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

 

http://talebzadehmich.wordpress.com  

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Ltd, its subsidiaries nor their employees accept any 
responsibility.

 

From: Udit Mehta [mailto:ume...@groupon.com] 
Sent: 02 December 2015 23:43
To: user@hive.apache.org
Subject: Re: Hive on spark table caching

 

Im using Spark 1.3 with Hive 1.2.1. I dont mind using a version of Spark higher 
than that but I read somewhere that 1.3 is the version of Spark currently 
supported by Hive. Can I use Spark 1.4 or 1.5 with Hive 1.2.1?

 

On Wed, Dec 2, 2015 at 3:19 PM, Mich Talebzadeh mailto:m...@peridale.co.uk> > wrote:

Hi,

 

Which version of spark are you using please?

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf 

 

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

 

http://talebzadehmich.wordpress.com 

 

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Ltd, its subsidiaries nor their employees accept any 
responsibility.

 

From: Udit Mehta [mailto:ume...@groupon.com  ] 
Sent: 02 December 2015 23:01
To: user@hive.apache.org  
Subject: Hive on spark table caching

 

Hi,

I have started using Hive on Spark recently and am exploring the benefits it 
offers. I was wondering if Hive on Spark has capabilities to cache table like 
Spark SQL. Or does it do any form of implicit caching in the long running job 
which it starts after running the first query? 

Thanks,

Udit

 



Re: Hive on spark table caching

2015-12-02 Thread Udit Mehta
Im using Spark 1.3 with Hive 1.2.1. I dont mind using a version of Spark
higher than that but I read somewhere that 1.3 is the version of Spark
currently supported by Hive. Can I use Spark 1.4 or 1.5 with Hive 1.2.1?

On Wed, Dec 2, 2015 at 3:19 PM, Mich Talebzadeh  wrote:

> Hi,
>
>
>
> Which version of spark are you using please?
>
>
>
> Mich Talebzadeh
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
> 
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
>
> *Publications due shortly:*
>
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
>
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
> one out shortly
>
>
>
> http://talebzadehmich.wordpress.com
> 
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Ltd, its subsidiaries nor their employees
> accept any responsibility.
>
>
>
> *From:* Udit Mehta [mailto:ume...@groupon.com]
> *Sent:* 02 December 2015 23:01
> *To:* user@hive.apache.org
> *Subject:* Hive on spark table caching
>
>
>
> Hi,
>
> I have started using Hive on Spark recently and am exploring the benefits
> it offers. I was wondering if Hive on Spark has capabilities to cache table
> like Spark SQL. Or does it do any form of implicit caching in the long
> running job which it starts after running the first query?
>
> Thanks,
>
> Udit
>


RE: Hive on spark table caching

2015-12-02 Thread Mich Talebzadeh
Hi,

 

Which version of spark are you using please?

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

 

 http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

 

  http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Ltd, its subsidiaries nor their employees accept any 
responsibility.

 

From: Udit Mehta [mailto:ume...@groupon.com] 
Sent: 02 December 2015 23:01
To: user@hive.apache.org
Subject: Hive on spark table caching

 

Hi,

I have started using Hive on Spark recently and am exploring the benefits it 
offers. I was wondering if Hive on Spark has capabilities to cache table like 
Spark SQL. Or does it do any form of implicit caching in the long running job 
which it starts after running the first query? 

Thanks,

Udit



Hive on spark table caching

2015-12-02 Thread Udit Mehta
Hi,

I have started using Hive on Spark recently and am exploring the benefits
it offers. I was wondering if Hive on Spark has capabilities to cache table
like Spark SQL. Or does it do any form of implicit caching in the long
running job which it starts after running the first query?

Thanks,
Udit


RE: how to get counts as a byproduct of a query

2015-12-02 Thread Ryan Harris
Personally, I'd do it this way...

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics

Select suba.X, suba.Y, suba.countA, subb.Z, subb.countB
FROM
(SELECT x, y, count(1) OVER (PARTITION BY X) as countA) suba
JOIN
(SELECT x, z, count(1) OVER (PARTITION BY X) as countB) subb
ON (suba.X = subb.X)

From: Frank Luo [mailto:j...@merkleinc.com]
Sent: Wednesday, December 02, 2015 2:43 PM
To: user@hive.apache.org
Subject: RE: how to get counts as a byproduct of a query

I might not illustrate the problem well. Let’s try on a sample.

Here is what I have:
Table_A, column X and Y
Table B, column X and Z

I want to do a join on both tables on column X, like
“select a.X, a.Y, b.Z
From A a
Join B b on A.X=B.X”

In the meaning time, I want to get counts for both Table A and B. I am not able 
to write a query to do both.


From: Jörn Franke [mailto:jornfra...@gmail.com]
Sent: Wednesday, December 02, 2015 3:22 PM
To: user@hive.apache.org
Subject: Re: how to get counts as a byproduct of a query

I am  not sure if I understand, but why this should not be possible using SQL 
in hive?

On 02 Dec 2015, at 21:26, Frank Luo 
mailto:j...@merkleinc.com>> wrote:
Didn’t get any response, so trying one more time. I cannot believe I am the 
only one facing the problem.

From: Frank Luo
Sent: Tuesday, December 01, 2015 10:40 PM
To: user@hive.apache.org
Subject: how to get counts as a byproduct of a query

Very often I need to run a query against a table(s), then collect some counts. 
I am wondering if there is a way to kill two birds by scanning the table once. 
(I don’t mind to save the counts as a separate file or something like that)

For example, I got a table A and B. I need to do an inner join to get some 
result. In the meaning time, I need to know the table counts for both A and B. 
Is there a smart way to get join result as well as the counts by reading the 
tables once?

Thanks in advance.

This email and any attachments transmitted with it are intended for use by the 
intended recipient(s) only. If you have received this email in error, please 
notify the sender immediately and then delete it. If you are not the intended 
recipient, you must not keep, use, disclose, copy or distribute this email 
without the author’s prior permission. We take precautions to minimize the risk 
of transmitting software viruses, but we advise you to perform your own virus 
checks on any attachment to this message. We cannot accept liability for any 
loss or damage caused by software viruses. The information contained in this 
communication may be confidential and may be subject to the attorney-client 
privilege.

This email and any attachments transmitted with it are intended for use by the 
intended recipient(s) only. If you have received this email in error, please 
notify the sender immediately and then delete it. If you are not the intended 
recipient, you must not keep, use, disclose, copy or distribute this email 
without the author’s prior permission. We take precautions to minimize the risk 
of transmitting software viruses, but we advise you to perform your own virus 
checks on any attachment to this message. We cannot accept liability for any 
loss or damage caused by software viruses. The information contained in this 
communication may be confidential and may be subject to the attorney-client 
privilege.

==
THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL 
and may contain information that is privileged and exempt from disclosure under 
applicable law. If you are neither the intended recipient nor responsible for 
delivering the message to the intended recipient, please note that any 
dissemination, distribution, copying or the taking of any action in reliance 
upon the message is strictly prohibited. If you have received this 
communication in error, please notify the sender immediately.  Thank you.


RE: how to get counts as a byproduct of a query

2015-12-02 Thread Frank Luo
I might not illustrate the problem well. Let’s try on a sample.

Here is what I have:
Table_A, column X and Y
Table B, column X and Z

I want to do a join on both tables on column X, like
“select a.X, a.Y, b.Z
From A a
Join B b on A.X=B.X”

In the meaning time, I want to get counts for both Table A and B. I am not able 
to write a query to do both.


From: Jörn Franke [mailto:jornfra...@gmail.com]
Sent: Wednesday, December 02, 2015 3:22 PM
To: user@hive.apache.org
Subject: Re: how to get counts as a byproduct of a query

I am  not sure if I understand, but why this should not be possible using SQL 
in hive?

On 02 Dec 2015, at 21:26, Frank Luo 
mailto:j...@merkleinc.com>> wrote:
Didn’t get any response, so trying one more time. I cannot believe I am the 
only one facing the problem.

From: Frank Luo
Sent: Tuesday, December 01, 2015 10:40 PM
To: user@hive.apache.org
Subject: how to get counts as a byproduct of a query

Very often I need to run a query against a table(s), then collect some counts. 
I am wondering if there is a way to kill two birds by scanning the table once. 
(I don’t mind to save the counts as a separate file or something like that)

For example, I got a table A and B. I need to do an inner join to get some 
result. In the meaning time, I need to know the table counts for both A and B. 
Is there a smart way to get join result as well as the counts by reading the 
tables once?

Thanks in advance.

This email and any attachments transmitted with it are intended for use by the 
intended recipient(s) only. If you have received this email in error, please 
notify the sender immediately and then delete it. If you are not the intended 
recipient, you must not keep, use, disclose, copy or distribute this email 
without the author’s prior permission. We take precautions to minimize the risk 
of transmitting software viruses, but we advise you to perform your own virus 
checks on any attachment to this message. We cannot accept liability for any 
loss or damage caused by software viruses. The information contained in this 
communication may be confidential and may be subject to the attorney-client 
privilege.

This email and any attachments transmitted with it are intended for use by the 
intended recipient(s) only. If you have received this email in error, please 
notify the sender immediately and then delete it. If you are not the intended 
recipient, you must not keep, use, disclose, copy or distribute this email 
without the author’s prior permission. We take precautions to minimize the risk 
of transmitting software viruses, but we advise you to perform your own virus 
checks on any attachment to this message. We cannot accept liability for any 
loss or damage caused by software viruses. The information contained in this 
communication may be confidential and may be subject to the attorney-client 
privilege.


How to register HCatlog Library as part of pig script file

2015-12-02 Thread mahender bigdata

Hi,
We would like to make use of HCatlog table in our PIG Script file, 
Currently we are opening the PIG Command with -UseHCatlog option for 
registering or loading HCatlog library. Is there a way in PIG script to 
register HCatlog jar files
and execute directly on pig command prompt directly without any 
parameter options. Is this achievable ?


/Mahender


Re: how to get counts as a byproduct of a query

2015-12-02 Thread Jörn Franke
I am  not sure if I understand, but why this should not be possible using SQL 
in hive? 

> On 02 Dec 2015, at 21:26, Frank Luo  wrote:
> 
> Didn’t get any response, so trying one more time. I cannot believe I am the 
> only one facing the problem.
>  
> From: Frank Luo 
> Sent: Tuesday, December 01, 2015 10:40 PM
> To: user@hive.apache.org
> Subject: how to get counts as a byproduct of a query
>  
> Very often I need to run a query against a table(s), then collect some 
> counts. I am wondering if there is a way to kill two birds by scanning the 
> table once. (I don’t mind to save the counts as a separate file or something 
> like that)
>  
> For example, I got a table A and B. I need to do an inner join to get some 
> result. In the meaning time, I need to know the table counts for both A and 
> B. Is there a smart way to get join result as well as the counts by reading 
> the tables once?
>  
> Thanks in advance.
> This email and any attachments transmitted with it are intended for use by 
> the intended recipient(s) only. If you have received this email in error, 
> please notify the sender immediately and then delete it. If you are not the 
> intended recipient, you must not keep, use, disclose, copy or distribute this 
> email without the author’s prior permission. We take precautions to minimize 
> the risk of transmitting software viruses, but we advise you to perform your 
> own virus checks on any attachment to this message. We cannot accept 
> liability for any loss or damage caused by software viruses. The information 
> contained in this communication may be confidential and may be subject to the 
> attorney-client privilege.


RE: how to get counts as a byproduct of a query

2015-12-02 Thread Ryan Harris
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-MULTITABLEINSERT

From: Frank Luo [mailto:j...@merkleinc.com]
Sent: Wednesday, December 02, 2015 1:26 PM
To: user@hive.apache.org
Subject: RE: how to get counts as a byproduct of a query

Didn’t get any response, so trying one more time. I cannot believe I am the 
only one facing the problem.

From: Frank Luo
Sent: Tuesday, December 01, 2015 10:40 PM
To: user@hive.apache.org
Subject: how to get counts as a byproduct of a query

Very often I need to run a query against a table(s), then collect some counts. 
I am wondering if there is a way to kill two birds by scanning the table once. 
(I don’t mind to save the counts as a separate file or something like that)

For example, I got a table A and B. I need to do an inner join to get some 
result. In the meaning time, I need to know the table counts for both A and B. 
Is there a smart way to get join result as well as the counts by reading the 
tables once?

Thanks in advance.

This email and any attachments transmitted with it are intended for use by the 
intended recipient(s) only. If you have received this email in error, please 
notify the sender immediately and then delete it. If you are not the intended 
recipient, you must not keep, use, disclose, copy or distribute this email 
without the author’s prior permission. We take precautions to minimize the risk 
of transmitting software viruses, but we advise you to perform your own virus 
checks on any attachment to this message. We cannot accept liability for any 
loss or damage caused by software viruses. The information contained in this 
communication may be confidential and may be subject to the attorney-client 
privilege.

==
THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL 
and may contain information that is privileged and exempt from disclosure under 
applicable law. If you are neither the intended recipient nor responsible for 
delivering the message to the intended recipient, please note that any 
dissemination, distribution, copying or the taking of any action in reliance 
upon the message is strictly prohibited. If you have received this 
communication in error, please notify the sender immediately.  Thank you.


RE: how to get counts as a byproduct of a query

2015-12-02 Thread Frank Luo
Didn’t get any response, so trying one more time. I cannot believe I am the 
only one facing the problem.

From: Frank Luo
Sent: Tuesday, December 01, 2015 10:40 PM
To: user@hive.apache.org
Subject: how to get counts as a byproduct of a query

Very often I need to run a query against a table(s), then collect some counts. 
I am wondering if there is a way to kill two birds by scanning the table once. 
(I don’t mind to save the counts as a separate file or something like that)

For example, I got a table A and B. I need to do an inner join to get some 
result. In the meaning time, I need to know the table counts for both A and B. 
Is there a smart way to get join result as well as the counts by reading the 
tables once?

Thanks in advance.

This email and any attachments transmitted with it are intended for use by the 
intended recipient(s) only. If you have received this email in error, please 
notify the sender immediately and then delete it. If you are not the intended 
recipient, you must not keep, use, disclose, copy or distribute this email 
without the author’s prior permission. We take precautions to minimize the risk 
of transmitting software viruses, but we advise you to perform your own virus 
checks on any attachment to this message. We cannot accept liability for any 
loss or damage caused by software viruses. The information contained in this 
communication may be confidential and may be subject to the attorney-client 
privilege.