can partition and bucket used on external table?

2013-07-18 Thread ch huang
ATT


Re: hive query is very slow,why?

2013-07-18 Thread ch huang
the table records are more than 12000

On Fri, Jul 19, 2013 at 9:34 AM, Stephen Boesch  wrote:

> one mapper.  how big is the table?
>
>
> 2013/7/18 ch huang 
>
>> i wait long time,no result ,why hive is so slow?
>>
>> hive> select cookie,url,ip,source,vsid,token,residence,edate from
>> hb_cookie_history where edate>='1371398400500' and edate<='1371400200500';
>> Total MapReduce jobs = 1
>> Launching Job 1 out of 1
>> Number of reduce tasks is set to 0 since there's no reduce operator
>> Starting Job = job_1374138311742_0007, Tracking URL =
>> http://CH22:8088/proxy/application_1374138311742_0007/
>> Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill
>> job_1374138311742_0007
>>
>
>


Re: hive query is very slow,why?

2013-07-18 Thread Stephen Boesch
one mapper.  how big is the table?


2013/7/18 ch huang 

> i wait long time,no result ,why hive is so slow?
>
> hive> select cookie,url,ip,source,vsid,token,residence,edate from
> hb_cookie_history where edate>='1371398400500' and edate<='1371400200500';
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_1374138311742_0007, Tracking URL =
> http://CH22:8088/proxy/application_1374138311742_0007/
> Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1374138311742_0007
>


hive query is very slow,why?

2013-07-18 Thread ch huang
i wait long time,no result ,why hive is so slow?

hive> select cookie,url,ip,source,vsid,token,residence,edate from
hb_cookie_history where edate>='1371398400500' and edate<='1371400200500';
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1374138311742_0007, Tracking URL =
http://CH22:8088/proxy/application_1374138311742_0007/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1374138311742_0007


Re: Hive Architecture - Execution on nodes

2013-07-18 Thread Alan Gates

On Jul 18, 2013, at 1:40 PM, Tzur Turkenitz wrote:

> Hello,
> Just finished reading the Hive-Architecture pdf, and failed to find the 
> answers I was hoping for. So here I am, hoping this community will shed some 
> light.
> I think I know what the answers will be, I need that bolted down and secured.
>  
> We are concerned on how data is transferred between data-nodes and hive, 
> especially when it comes to clusters were there’s no SSL between nodes.
>  
> And this is the user-case:
> 1.   Table employee is a Hive table, with SerDe
> 2.   MapReduce job accesses the table Employees which holds Encrypted data
> 3.   SerDe decrypts the data
> 4.   Post-SerDe output is returned to the MapReduce job and saved to a 
> new Hive table using a new Encryption implementation
>  
> The flow, as I think it currently is should be:
> MapReduce Job -- > Read table metadata -- > SerDe creates map-reduce job -- > 
> distributes across nodes
>  
> Which means that data is decrypted on the local nodes and then sent in 
> clear-text back to the original map-reduce job to be saved in a new table.
> Is that correct? L

No.  Data deserialization (which is what a serde does, not decryption) is done 
as part of reading in the map reduce job.  Mainly only query parsing, 
validation, and planning is done on the client node.

Alan.
>  



Hive Architecture - Execution on nodes

2013-07-18 Thread Tzur Turkenitz
Hello,

Just finished reading the Hive-Architecture pdf, and failed to find the
answers I was hoping for. So here I am, hoping this community will shed some
light.

I think I know what the answers will be, I need that bolted down and
secured.

 

We are concerned on how data is transferred between data-nodes and hive,
especially when it comes to clusters were there's no SSL between nodes.

 

And this is the user-case:

1.   Table employee is a Hive table, with SerDe

2.   MapReduce job accesses the table Employees which holds Encrypted
data

3.   SerDe decrypts the data

4.   Post-SerDe output is returned to the MapReduce job and saved to a
new Hive table using a new Encryption implementation

 

The flow, as I think it currently is should be:

MapReduce Job -- > Read table metadata -- > SerDe creates map-reduce job --
> distributes across nodes

 

Which means that data is decrypted on the local nodes and then sent in
clear-text back to the original map-reduce job to be saved in a new table.

Is that correct? :(

 



unsubscribe

2013-07-18 Thread S Byrne
unsubscribe

> This message has no content.


Re: unsubscribe

2013-07-18 Thread Ted Yu
You have to send a mail to user-unsubscr...@hive.apache.org

On Thu, Jul 18, 2013 at 1:30 PM, Beau Rothrock wrote:

>


unsubscribe

2013-07-18 Thread Beau Rothrock


Re: Hive - Alter column datatype

2013-07-18 Thread Jonathan Medwig
Changing the datatype of a column will *not* alter the column's data 
itself - just Hive's metadata for that table. To modify the type of 
existing data:


1. Create a new table with the desired structure
2. Copy the existing table into the new table - applying any necessary 
type casting

3. Drop the old table and rename the new one to the old one's name

See more info here -
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ChangeColumnName%2FType%2FPosition%2FComment

Jon

On 7/18/13 2:45 AM, Manickam P wrote:

Hi experts,

I have created a table in hive and loaded the data into it. now i want 
to change the datatype of one particular column.

Do i need to drop and move the file again to hive?
will it work fine if i just alter the data type alone in hive?



Thanks,
Manickam P




Hive Metastore cannot start

2013-07-18 Thread Nguyen Chung
--help me!---





Hi Hive users! Please help me, I can’t start Hive Metastore!



I have started Hive Metastore and using Kerberos for authentication.



In Kerberos, I created a principal:bss_ra/mas...@example.com



[bss_ra@master sbin]$ sudo kadmin.local

Authenticating as principal root/ad...@example.com with password.

kadmin.local:  addprinc -randkey bss_ra/mas...@example.com

WARNING: no policy specified for bss_ra/mas...@example.com; defaulting to
no policy

Principal "bss_ra/mas...@example.com" created



And keytab file for principal above:   bss_ra.service.keytab



kadmin.local:  xst -norandkey -k bss_ra.service.keytab bss_ra/
mas...@example.com

Entry for principal bss_ra/mas...@example.com with kvno 1, encryption type
aes256-cts-hmac-sha1-96 added to keytab WRFILE:bss_ra.service.keytab.

Entry for principal bss_ra/mas...@example.com with kvno 1, encryption type
aes128-cts-hmac-sha1-96 added to keytab WRFILE:bss_ra.service.keytab.

Entry for principal bss_ra/mas...@example.com with kvno 1, encryption type
des3-cbc-sha1 added to keytab WRFILE:bss_ra.service.keytab.

Entry for principal bss_ra/mas...@example.com with kvno 1, encryption type
arcfour-hmac added to keytab WRFILE:bss_ra.service.keytab.

Entry for principal bss_ra/mas...@example.com with kvno 1, encryption type
des-hmac-sha1 added to keytab WRFILE:bss_ra.service.keytab.

Entry for principal bss_ra/mas...@example.com with kvno 1, encryption type
des-cbc-md5 added to keytab WRFILE:bss_ra.service.keytab.



And in my hive-site.xml configuration, I configured:







fs.default.name

hdfs://master:54310





mapred.job.tracker

master:9311





hive.metastore.warehouse.dir

hdfs://master:54310/user/hive/warehouse





hive.metastore.local

true





javax.jdo.option.ConnectionURL


jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true





javax.jdo.option.ConnectionDriverName

com.mysql.jdbc.Driver





javax.jdo.option.ConnectionUserName

duytx





javax.jdo.option.ConnectionPassword

duytxt







hive.exec.parallel

true









hive.metastore.sasl.enabled

true







  hive.metastore.kerberos.keytab.file

  /u01/APP/bss_ra/bss_ra.service.keytab







  hive.metastore.kerberos.principal

  bss_ra/mas...@example.com

  The service principal for the metastore thrift
server. The special string _HOST will be replaced automatically with the
correct host name.







And these are logs when I start hive metastore:



[bss_ra@master bin]$ ./hive --service metastore

Starting Hive Metastore Server

WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please
use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties
files.

org.apache.thrift.transport.TTransportException: Kerberos principal should
have 3 parts: bss_ra

at
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server.createTransportFactory(HadoopThriftAuthBridge20S.java:268)

at
org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:3780)

at
org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:3729)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Exception in thread "main" org.apache.thrift.transport.TTransportException:
Kerberos principal should have 3 parts: bss_ra

at
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server.createTransportFactory(HadoopThriftAuthBridge20S.java:268)

at
org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:3780)

at
org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:3729)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at org.apache.hadoop.util.RunJar.main(RunJar.java:156)





Please tell me where I wrong, or there is any configuration had I missed?



Thank you very much!!


-- 

Nguyễn Văn Chung
Mobile: +84989746579

Re: Hive - Alter column datatype

2013-07-18 Thread Stephen Sprague
let's put it this way.  what makes you ask the question that altering the
datatype wouldn't work?   After all that's why it's there. :)


On Thu, Jul 18, 2013 at 2:45 AM, Manickam P  wrote:

> Hi experts,
>
> I have created a table in hive and loaded the data into it. now i want to
> change the datatype of one particular column.
> Do i need to drop and move the file again to hive?
> will it work fine if i just alter the data type alone in hive?
>
>
>
> Thanks,
> Manickam P
>


Hive - Alter column datatype

2013-07-18 Thread Manickam P
Hi experts,
I have created a table in hive and loaded the data into it. now i want to 
change the datatype of one particular column. Do i need to drop and move the 
file again to hive?will it work fine if i just alter the data type alone in 
hive? 


Thanks,Manickam P 

Re: Use RANK OVER PARTITION function in Hive 0.11

2013-07-18 Thread Jérôme Verdier
Hi,

Since we saw that we have to give arguments in RANK() function, i'm trying
to translate this one (working on Oracle 10g) to be functionnally in Hive :

RANK() OVER (PARTITION BY mag.co_magasin, dem.id_produit ORDER BY
pnvente.dt_debut_commercial DESC,
COALESCE(pnvente.id_produit,dem.id_produit) DESC) as rang

i try this :

RANK(pnvente.dt_debut_commercial,
COALESCE(pnvente.id_produit,dem.id_produit)) OVER (PARTITION BY
mag.co_magasin, dem.id_produit ORDER BY pnvente.dt_debut_commercial DESC,
COALESCE(pnvente.id_produit,dem.id_produit) DESC) as rang

and this :


RANK(pnvente.dt_debut_commercial, pnvente.id_produit, dem.id_produit) OVER
(PARTITION BY mag.co_magasin, dem.id_produit ORDER BY
pnvente.dt_debut_commercial DESC,
COALESCE(pnvente.id_produit,dem.id_produit) DESC) as rang

But Hive is giving me another error :

FAILED: SemanticException Failed to breakup Windowing invocations into
Groups. At least 1 group must only depend on input columns. Also check for
circular dependencies.
Underlying error: Ranking Functions can take no arguments

i don't understand this error, in the first try, he said that he can't work
without arguments, and now, rank function is falling because of the
arguments.

what is wrong now ?



2013/7/17 Richa Sharma 

> my bad ... in relational databases we generally do not give a column name
> inside rank() ... but the one in (partition by  order by..) is
> sufficient.
>
> But looks like that's not the case in Hive
>
>
> Jerome,
>
> Please look at the examples in link below. See if you are able to make it
> work
>
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics#LanguageManualWindowingAndAnalytics-PARTITIONBYwithpartitioning%2CORDERBY%2Candwindowspecification
>
>
>
> Cant help you beyond this as i don't have Hive 0.11 :-(
>
>
> Richa
>
>
> On Wed, Jul 17, 2013 at 3:08 PM, Jérôme Verdier <
> verdier.jerom...@gmail.com> wrote:
>
>> Hi Richa,
>>
>> I have tried one query, with what i've understand of  Vijay's tips.
>>
>> SELECT code_entite, RANK(mag.me_vente_ht) OVER (PARTITION BY
>> mag.co_societe ORDER BY  mag.me_vente_ht) AS rank FROM
>> default.thm_renta_rgrp_produits_n_1 mag;
>>
>> This query is working, it gives me results.
>>
>> You say that maybe i'm hitting the same bug of JIRA HIVE-4663, but query
>> is also failling when i put analytical columns in...
>>
>>
>> 2013/7/17 Richa Sharma 
>>
>>> Vijay
>>>
>>> Jerome has already passed column -> mag.co_societe for rank.
>>>
>>> syntax -> RANK() OVER (PARTITION BY mag.co_societe ORDER BY
>>> mag.me_vente_ht)
>>> This will generate a rank for column mag.co_societe based on column
>>> value me_vente_ht
>>>
>>> Jerome,
>>>
>>> Its possible you are also hitting the same bug as I mentioned in my
>>> email before.
>>>
>>>
>>> Richa
>>>
>>>
>>> On Wed, Jul 17, 2013 at 2:31 PM, Vijay  wrote:
>>>
 As the error message states: "One ore more arguments are expected," you
 have to pass a column to the rank function.


 On Wed, Jul 17, 2013 at 1:12 AM, Jérôme Verdier <
 verdier.jerom...@gmail.com> wrote:

> Hi Richa,
>
> I have tried a simple query without joins, etc
>
> SELECT RANK() OVER (PARTITION BY mag.co_societe ORDER BY
> mag.me_vente_ht),mag.co_societe, mag.me_vente_ht FROM
> default.thm_renta_rgrp_produits_n_1 mag;
>
> Unfortunately, the error is the same like previously.
>
> Error: Query returned non-zero code: 4, cause: FAILED:
> SemanticException Failed to breakup Windowing invocations into Groups. At
> least 1 group must only depend on input columns. Also check for circular
> dependencies.
>
> Underlying error:
> org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more
> arguments are expected.
> SQLState:  42000
> ErrorCode: 4
>
>
>
>
> 2013/7/17 Richa Sharma 
>
>> Jerome
>>
>> I would recommend that you try Rank function with columns from just
>> one table first.
>> Once it is established that rank is working fine then add all the
>> joins.
>>
>> I am still on Hive 0.10 so cannot test it myself.
>> However, I can find a similar issue on following link - so its
>> possible you are facing issues due to this reported bug.
>>
>> https://issues.apache.org/jira/browse/HIVE-4663
>>
>>
>> Richa
>>
>>
>> On Tue, Jul 16, 2013 at 6:41 PM, Jérôme Verdier <
>> verdier.jerom...@gmail.com> wrote:
>>
>>> You can see my query below :
>>>
>>> SELECT
>>> mag.co_magasin,
>>> dem.id_produit  as
>>> id_produit_orig,
>>> pnvente.dt_debut_commercial as
>>> dt_debut_commercial,
>>> COALESCE(pnvente.id_produit,dem.id_produit) as
>>> id_produit,
>>> RANK() OVER (PARTITION BY mag.co_magasin, dem.id_produit
>>> ORDER

Re: Hive does not package a custom inputformat in the MR job jar when the custom inputformat class is add as aux jar.

2013-07-18 Thread Matouk IFTISSEN
that is what I search for a long time, and no responses. But if you are not
in the cloud (AWS, Azure,...) you can add the jar for your all Datanodes in
$HADOOP_HOME/lib ,  and then restart  the service mapreduce-tasktracker like
this

/etc/init.d/*mapreduce-tasktracker stop

/etc/init.d/*mapreduce-tasktracker start

Hope this help you ;)


2013/7/18 Andrew Trask 

> Put them in hive's lib folder?
>
> Sent from my Rotary Phone
>
> On Jul 17, 2013, at 11:14 PM, Mitesh Peshave  wrote:
>
> > Hello,
> >
> > I am trying to use a custom inputformat for a hive table.
> >
> > When I add the jar containing the custom inputformat through a client,
> such as the beeline, executing "add jar" command, all seems to work fine.
> In this scenario, hive seems to pass inputformat class to the JT and TTs. I
> believe, it correctly adds the jar to the distributed cache, and the MR
> jobs complete without any errors.
> >
> > But when I add the jar containing the custom input format under hive
> auxlibs diror the hive lib dir, hive does not seem to pass the inputformat
> class to the JT and TTs, causing the MR jobs to fails with
> ClassNotFoundException.
> >
> > The use-case I am looking at here is, multiple users connecting to the
> HiveServer using hive clients and query a table that uses the a custom
> inputformat. I would not want each user to add the jar executing the "add
> jar" command before the users start querying the table.
> >
> > Is there a way to add extra jars to the hive server once and force the
> server to push these jars to JT for every MR jobs it generates?
> >
> > Appreciate,
> > Mitesh
>


Re: which approach is better

2013-07-18 Thread Bennie Schut
The best way to restore is from a backup. We use distcp to keep this 
scalable : http://hadoop.apache.org/docs/r1.2.0/distcp2.html
The data we feed to hdfs also gets pushed to this backup and the 
metadatabase from hive also gets pushed here. So this combination works 
well for us (had to use it once).
Even if a namenode could never crash and all software worked fine 100% 
of the time there is always the one crazy user/admin who will find a way 
to wipe all data.

To me backups are not optional.

Op 17-7-2013 20:17, Hamza Asad schreef:
I use data to generates reports on daily basis, Do couple of analysis 
and its insert once and read many on daily basis.  But My main purpose 
is to secure my data and easily recover it even if my hadoop(datanode) 
OR HDFS crashes. As uptill now, i'm using approach in which data has 
been retrieved directly from HDFS and few days back my hadoop crashes 
and when i repair it, i was unable to recover my Old data which 
resides on HDFS. So please let me know do i have to make architectural 
change OR is there any way to recover data which resides in crashed HDFS



On Wed, Jul 17, 2013 at 11:00 PM, Nitin Pawar > wrote:


what's the purpose of data storage?
whats the read and write throughput you expect?
whats the way you will access data while read?
whats are your SLAs on both read and write?

there will be more questions others will ask so be ready for that :)



On Wed, Jul 17, 2013 at 11:10 PM, Hamza Asad
mailto:hamza.asa...@gmail.com>> wrote:

Please let me knw which approach is better. Either i save my
data directly to HDFS and run hive (shark) queries over it OR
store my data in HBASE, and then query it.. as i want to
ensure efficient data retrieval and data remains safe and can
easily recover if hadoop crashes.

-- 
*/Muhammad Hamza Asad/*





-- 
Nitin Pawar





--
*/Muhammad Hamza Asad/*