Re: MongoDB storage handler for HIVE

2011-11-17 Thread Stephen Boesch
Nice idea!  I have worked a bit with Mongo and am leaning towards hive .
This could be a nice combo. will check it out (pun intended)

2011/11/17 YC Huang ychuang...@gmail.com

 I just have a quick and dirty implementation of a MongoDB storage handler
 for HIVE, the project is hosted on GitHub:
 https://github.com/yc-huang/Hive-mongo.
 Since Hive table do not support 'update' efficiently, we use MongoDB to
 store those data, e.g. 'meta' data like user profile info, which need to be
 updated sometimes.

 Post here in case maybe someone else need the same thing, and also welcome
 bug reports and suggestions, thanks.

 YC





Re: Mysql metastore configuration error.

2011-11-21 Thread Stephen Boesch
Was that code above  *verbatim? *
because there is a typo

Hive Load *s*ata local inpath ‘path/to/abcd.txt’ into table abcd;

(load sata not load data)

2011/11/21 Aditya Singh30 aditya_sing...@infosys.com

 Hi Everybody,

 I am using Apache’s Hadoop-0.20.2 and
 Apache’s Hive-0.7.0. I have a 2 node cluster. One Redhat Linux 6.0(Hadoop
 Server) and other Windows 7 using Cygwin. The Hadoop cluster is working
 fine. I have checked by executing various examples provided with Hadoop.
 Map reduce jobs are being executed fine. For Hive I am using MySQL for
 metastore with following configuration is hive-site.xml :

 ** **

 property   

 namejavax.jdo.option.ConnectionURL/name   

 valuejdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true/value
 

 /property   

 ** **

 property   

 namejavax.jdo.option.ConnectionDriverName/name   

 valuecom.mysql.jdbc.Driver/value 

 /property   

 ** **

 property   

 namejavax.jdo.option.ConnectionUserName/name   

 valuehiveuser/value 

 /property   

 ** **

 property   

 namejavax.jdo.option.ConnectionPassword/name   

 valuehiveuser/value 

 /property   

 ** **

 property 

   namedatanucleus.autoCreateSchema/name 

   valuefalse/value 

 /property 

   

 property 

   namedatanucleus.fixedDatastore/name 

   valuetrue/value 

 /property

 ** **

 ** **

 I created the DB and hiveuser in mysql using following commands:

 mysql CREATE DATABASE metastore; 

 mysql USE metastore; 

 mysql SOURCE
 /usr/local/hive/scripts/metastore/upgrade/mysql/hive-schema-0.7.0.mysql.sql;
 

 ** **

 mysql CREATE USER 'hiveuser'@'%' IDENTIFIED BY 'hiveuser'; 

 mysql GRANT ALL ON metastore.* TO 'hiveuser'@'%'; 

 ** **

 I created a table using the following command on hive:

 hive Create table abcd(ab int, cd string) row format delimited fields
 terminated by ‘#’ stored as textfile;

 ** **

 Then I created a file abcddata.txt containing the following data

 11#aa

 22#bb

 33#cc

 ** **

 Then I loaded this data into table abcd using :

 Hive Load sata local inpath ‘path/to/abcd.txt’ into table abcd;

 ** **

 Now when I execute “select * from abcd” it runs successfully and shows
 the data in abcd. 

 But if I run “select ab from abcd” or “ select * from abcd where cd=’aa’”
 it returns error: 

 ** **

 FAILED: Execution Error, return code 2 from
 org.apache.hadoop.hive.ql.exec.MapRedTask

 ** **

 In the logs I found:

 Caused by: java.util.NoSuchElementException

 at java.util.Vector.lastElement(Vector.java:456)

 at com.sun.beans.ObjectHandler.lastExp(ObjectHandler.java:134)

 at
 com.sun.beans.ObjectHandler.dequeueResult(ObjectHandler.java:138)

 at java.beans.XMLDecoder.readObject(XMLDecoder.java:201)

 at
 org.apache.hadoop.hive.ql.exec.Utilities.deserializeMapRedWork(Utilities.java:462)
 

 at
 org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:184)
 

 ** **

 ** **

 And when I tried to access Hive from a java program using connection
 string:

 (jdbc:mysql://master:3306/metastore,hiveuser,hiveuser)

 Running command “describe abcd” it returns:

 Exception in thread main
 com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table
 'metastore.abcd' doesn't exist

 ** **

 Then on the mysql server I ran:

 mysql use metastore;

 mysql show tables;

 ** **

 The table abcd is not there. The table is not being stored in the mysql
 metastore db. 

 So how come on Hive CLI, when I do “select * from abcd” it shows the data
 in the table. And “show tables” shows abcd there. It means Hive CLI is not
 using the mysql metastore for storing and “select *” statement but whenever
 it’s a statement that requires map reduce jobs or while accessing via java
 program using connection string it uses mysql metastore. It must be some
 configuration mistake I think. Please help me out.

 ** **

 ** **

 ** **

 Regards,

 Aditya Singh 

 Infosys, India.

 ** **

  CAUTION - Disclaimer *
 This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
 for the use of the addressee(s). If you are not the intended recipient, please
 notify the sender by e-mail and delete the original message. Further, you are 
 not
 to copy, disclose, or distribute this e-mail or its contents to any other 
 person and
 any such actions are unlawful. This e-mail may contain viruses. Infosys has 
 taken
 every reasonable precaution to minimize this risk, but is not liable for any 
 damage
 you may sustain as a result of any virus in this e-mail. You should carry out 
 your
 own virus checks before opening the e-mail or attachment. Infosys reserves the
 right to monitor and review the content of all 

Re: Important Question

2012-01-25 Thread Stephen Boesch
Dalia
 your requirements appear to be transaction oriented and thus OLTP systems
- i.e. regular relational databases - are more likely to be suitable than a
hive (/hadoop) based solution.  Hive is more for business intelligence and
certainly includes latencies - which by saying 'realtime'  - would likely
not be acceptable for your application.

stephenb

2012/1/25 Dalia Sobhy dalia.mohso...@hotmail.com

  I will explain to u more Mike.

 I am building a Software Oriented Architecture, I want my API to provide
 some services such as Add/Delete Patients, Search for a patient by name/ID,
 count the number of people who are suffering from measles in Alexandria
 Egypt.

 Something like that so I am wondering which best suits my API ??

  To: dalia.mohso...@hotmail.com
  CC: u...@hbase.apache.org; user@hive.apache.org
  Subject: Re: Important Question
  From: mspre...@us.ibm.com
  Date: Wed, 25 Jan 2012 12:05:39 -0500

 
  BTW, what do you mean by realtime? Do you mean you want to run some
  non-trivial query quickly enough for some sort of interactive use? Can
  you give us a feel for the sort of queries that interest you?
 
  Thanks,
  Mike
 
 
 
  From: Dalia Sobhy dalia.mohsobhy@hotm ail.com
  To: u...@hbase.apache.org u...@hbase.apache.org
  Cc: user@hive.apache.org user@hive.apache.org,
  u...@hbase.apache.org u...@hbase.apache.org
  Date: 01/25/2012 11:34 AM
  Subject: Re: Important Question
 
 
 
  So what about HBQL??
  And if i had complex queries would i get stuck with HBase?
 
  Also can anyone provide me with examples of a table in RDBMS transformed
  into hbase, realtime query and analytical processing..
 
  Sent from my iPhone
 
  On 2012-01-25, at 6:15 PM, bejoy...@yahoo.com wrote:
 
   Real Time.. Definitely not hive. Go in for HBase, but don't expect
 Hbase
  to be as flexible as RDBMS. You need to choose your Row Key and Column
  Families wisely as per your requirements.
   For data mining and analytics you can mount Hive table over
  corresponding Hbase table and play on with SQL like queries.
  
  
  
   Regards
   Bejoy K S
  
   -Original Message-
   From: Dalia Sobhy dalia.mohso...@hotmail.com
   Date: Wed, 25 Jan 2012 17:01:08
   To: u...@hbase.apache.org; user@hive.apache.org
   Reply-To: user@hive.apache.org
   Subject: Important Question
  
  
   Dear all,
   I am developing an API for medical use i.e Hospital admissions and all
  about patients, thus transactions and queries and realtime data is
  important here...
   Therefore both real-time and analytical processing is a must..
   Therefore which best suits my application Hbase or Hive or another
  method ??
   Please reply quickly bec this is critical thxxx a million ;)
 
 
 **



Error while reading from task log url

2012-03-29 Thread Stephen Boesch
Hi
  I am able to run certain hive commands e.g. create table and select.. but
not others ..Also my hadoop pseudo disributed cluster is working fine -
i can run the examples.

Examples of commands that fail:

insert overwrite table demographics select * from demographics_local;
Control-C  (killing a task ends up with the same error with the Error
while reading from task log url


Hadoop job information for Stage-0: number of mappers: 1; number of
reducers: 0
2012-03-29 08:05:40,699 Stage-0 map = 0%,  reduce = 0%
2012-03-29 08:06:10,868 Stage-0 map = 100%,  reduce = 100%
Ended Job = job_201203231956_0010 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201203231956_0010_m_02 (and more) from job
job_201203231956_0010
Exception in thread Thread-160 java.lang.RuntimeException: Error while
reading from task log url
at
org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
at
org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:211)
at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Server returned HTTP response code: 400 for
URL:
http://localhost:50060/tasklog?taskid=attempt_201203231956_0010_m_00_3start=-8193
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
at java.net.URL.openStream(URL.java:1010)
at
org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
... 3 more
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.MapRedTask



I am running hive-0.8.1  aagainst hadoop-1.0.0


Re: Error while reading from task log url

2012-03-29 Thread Stephen Boesch
When I go to that url here is the result:

HTTP ERROR 400

Problem accessing /tasklog. Reason:

Argument attemptid is required

--
*Powered by Jetty://


*
2012/3/29 Stephen Boesch java...@gmail.com

 Hi
   I am able to run certain hive commands e.g. create table and select..
 but not others ..Also my hadoop pseudo disributed cluster is working
 fine - i can run the examples.

 Examples of commands that fail:

 insert overwrite table demographics select * from demographics_local;
 Control-C  (killing a task ends up with the same error with the Error
 while reading from task log url


 Hadoop job information for Stage-0: number of mappers: 1; number of
 reducers: 0
 2012-03-29 08:05:40,699 Stage-0 map = 0%,  reduce = 0%
 2012-03-29 08:06:10,868 Stage-0 map = 100%,  reduce = 100%
 Ended Job = job_201203231956_0010 with errors
 Error during job, obtaining debugging information...
 Examining task ID: task_201203231956_0010_m_02 (and more) from job
 job_201203231956_0010
 Exception in thread Thread-160 java.lang.RuntimeException: Error while
 reading from task log url
  at
 org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
 at
 org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:211)
  at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81)
  at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.IOException: Server returned HTTP response code: 400
 for URL:
 http://localhost:50060/tasklog?taskid=attempt_201203231956_0010_m_00_3start=-8193
  at
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
  at java.net.URL.openStream(URL.java:1010)
 at
 org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
  ... 3 more
 FAILED: Execution Error, return code 2 from
 org.apache.hadoop.hive.ql.exec.MapRedTask



 I am running hive-0.8.1  aagainst hadoop-1.0.0



How to use create .. as select

2012-03-29 Thread Stephen Boesch
I see hive-31 supposedly supports this, but when mimicking the syntax in
the jira i get errors

https://issues.apache.org/jira/browse/HIVE-31

hive create table   dem select demographics_local.*  from
demographics_local;
FAILED: Parse Error: line 1:19 cannot recognize input near 'select'
'demographics_local' '.' in create table statement

I tried some variants on the above  with similar results:

hive insert overwrite table dem select * from demographics_local;
FAILED: Error in semantic analysis: Line 1:23 Table not found 'dem'


If i create the target table first, then things are ok.  But is that
necessary?

hive create table dem(uid int, age string, gender string, dlocation string,
 children string, home_market_value string, home_owner_status string,
home_property_type string,
 household_income string, length_of_residence string, marital_status
string)
 ;
OK
hive insert overwrite table dem select * from
demographics_local;
Total MapReduce jobs = 2
Launching Job 1 out of 2
...
Table default.dem stats: [num_partitions: 0, num_files: 1, num_rows: 0,
total_size: 9946167, raw_data_size: 0]
301399 Rows loaded to dem
OK


Re: How to use create .. as select

2012-03-30 Thread Stephen Boesch
Hi Bejoy,
The syntax you suggested does work.

I have many years of oracle (as well as other rdbm's) so it would have
been more natural to have assumed the AS were present: but instead I
 followed the ayntax in the jira that came up (and which lacks the AS
clause).But as Edward mentions, the correct way to do this is to use
the hive manual, and that is what I will do going forward.

cheers
stephenb

2012/3/29 Bejoy Ks bejoy...@yahoo.com

 Hi Stephen
   You are missing AS in your statement, try this out
 Create table dem AS select *  from demographics_local;

 Regards
 Bejoy KS
   --
 *From:* Stephen Boesch java...@gmail.com
 *To:* user@hive.apache.org
 *Sent:* Thursday, March 29, 2012 10:45 PM
 *Subject:* How to use create .. as select

 I see hive-31 supposedly supports this, but when mimicking the syntax in
 the jira i get errors

 https://issues.apache.org/jira/browse/HIVE-31

 hive create table   dem select demographics_local.*  from
 demographics_local;
 FAILED: Parse Error: line 1:19 cannot recognize input near 'select'
 'demographics_local' '.' in create table statement

 I tried some variants on the above  with similar results:

 hive insert overwrite table dem select * from demographics_local;
 FAILED: Error in semantic analysis: Line 1:23 Table not found 'dem'


 If i create the target table first, then things are ok.  But is that
 necessary?

 hive create table dem(uid int, age string, gender string, dlocation
 string,
  children string, home_market_value string, home_owner_status string,
 home_property_type string,
  household_income string, length_of_residence string, marital_status
 string)
  ;
 OK
 hive insert overwrite table dem select * from
 demographics_local;
 Total MapReduce jobs = 2
 Launching Job 1 out of 2
 ...
 Table default.dem stats: [num_partitions: 0, num_files: 1, num_rows: 0,
 total_size: 9946167, raw_data_size: 0]
 301399 Rows loaded to dem
 OK







Custom hive-site.xml is ignored, how to find out why

2012-11-24 Thread Stephen Boesch
It seems the customized hive-site.xml is not being read. It lives under
$HIVE_HOME/conf  ( which happens to be /shared/hive/conf).  I have tried
everything there is to try:  set HIVE_CONF_DIR=/shared/hive/conf , added
--config /shared/hive/conf  and added debugging to the hive shell script
(bash -x) to ensure that the correct directory is actually getting used

The properties inside hive-site.xml are getting ignored: originally set to
using mysql , but instead the default derby is getting used. Then tried
changing  hive.metastore.local between true and false: no difference in
behavior - just going to /tmp/$USER for creating the derby no matter what.

I wondered whether hive-site.xml maybe has a syntax error and were getting
ignored: so removed everything except configuration/configuration:
 still no

This is the totally simplified hive-site.xml: it just has enough to try to
see if it is actually being read/applied (and it is not..):


steve@mithril:/shared/hive/conf$ cat hive-site.xml
?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?

configuration

!-- Hive Execution Parameters --
property
  namehive.metastore.local/name
  valuefalse/value
/property

property
namehive.exec.scratchdir/name
value/tmp/hive/hive-${user.name}/value
descriptionScratch space for Hive jobs/description
/property

property
  namehive.hwi.war.file/name
  value/shared/hive/lib/hive-hwi-0.9.0.war/value
  descriptionThis is the WAR file with the jsp content for Hive Web
Interface/description
/property

/configuration


After running some DDL in hive, for example, no files are created
underneath /tmp/hive  (instead they are going to /tmp/$USER which is
default - as if the custom hive-site.xml never existed.


Re: Custom hive-site.xml is ignored, how to find out why

2012-11-24 Thread Stephen Boesch
It appears that I were missing the *hive.metastore.uris* parameter.  That
one was not mentioned in the (several) blogs / tutorials that I had seen.


2012/11/24 Stephen Boesch java...@gmail.com


 It seems the customized hive-site.xml is not being read. It lives under
 $HIVE_HOME/conf  ( which happens to be /shared/hive/conf).  I have tried
 everything there is to try:  set HIVE_CONF_DIR=/shared/hive/conf , added
 --config /shared/hive/conf  and added debugging to the hive shell script
 (bash -x) to ensure that the correct directory is actually getting used

 The properties inside hive-site.xml are getting ignored: originally set to
 using mysql , but instead the default derby is getting used. Then tried
 changing  hive.metastore.local between true and false: no difference in
 behavior - just going to /tmp/$USER for creating the derby no matter what.

 I wondered whether hive-site.xml maybe has a syntax error and were getting
 ignored: so removed everything except configuration/configuration:
  still no

 This is the totally simplified hive-site.xml: it just has enough to try to
 see if it is actually being read/applied (and it is not..):


 steve@mithril:/shared/hive/conf$ cat hive-site.xml
 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?

 configuration

 !-- Hive Execution Parameters --
 property
   namehive.metastore.local/name
   valuefalse/value
 /property

 property
 namehive.exec.scratchdir/name
 value/tmp/hive/hive-${user.name}/value
 descriptionScratch space for Hive jobs/description
 /property

 property
   namehive.hwi.war.file/name
   value/shared/hive/lib/hive-hwi-0.9.0.war/value
   descriptionThis is the WAR file with the jsp content for Hive Web
 Interface/description
 /property

 /configuration


 After running some DDL in hive, for example, no files are created
 underneath /tmp/hive  (instead they are going to /tmp/$USER which is
 default - as if the custom hive-site.xml never existed.





Re: Custom hive-site.xml is ignored, how to find out why

2012-11-26 Thread Stephen Boesch
Hi,
  The problem may not have to do with hive-site.xml.  When I run hive
client by itself it connects successfully to mysql and creates / reads
metadataa.

The problem comes in when i run the metastore/thrift servers via:   hive
--service metastore   hive --service hiveserver.   As soon as i do that,
and try to run hive client (hive cli) then it goes back to trying to use
derby.

There must be additional settings required for the hive metastore service
beyond those in my hive-site.xml.

Btw from OP i aleady have the settings in hive-site.xml for connecting via
jdo to mysql.I don't have the data nucleus ones. Are those required?
The docs i saw did not mention them.


2012/11/26 Shreepadma Venugopalan shreepa...@cloudera.com

 Hi Stephen,

 If you wish to setup a mysql metastore, you need to have the following in
 your hive-site.xml,

 property
namejavax.jdo.option.ConnectionURL/name
valuejdbc:mysql://MYHOST/metastore/value
  /property

 property
namejavax.jdo.option.ConnectionDriverName/name
valuecom.mysql.jdbc.Driver/value
  /property

 property
namejavax.jdo.option.ConnectionUserName/name
valuehiveuser/value
  /property

 property
namejavax.jdo.option.ConnectionPassword/name
valuepassword/value
  /property

 property
namedatanucleus.autoCreateSchema/name
valuefalse/value
  /property

 property
namedatanucleus.fixedDatastore/name
valuetrue/value
  /property


 Thanks.
 Shreepadma


 On Sat, Nov 24, 2012 at 8:41 PM, Stephen Boesch java...@gmail.com wrote:

 It appears that I were missing the *hive.metastore.uris* parameter.
  That one was not mentioned in the (several) blogs / tutorials that I had
 seen.


 2012/11/24 Stephen Boesch java...@gmail.com


 It seems the customized hive-site.xml is not being read. It lives under
 $HIVE_HOME/conf  ( which happens to be /shared/hive/conf).  I have tried
 everything there is to try:  set HIVE_CONF_DIR=/shared/hive/conf , added
 --config /shared/hive/conf  and added debugging to the hive shell script
 (bash -x) to ensure that the correct directory is actually getting used

 The properties inside hive-site.xml are getting ignored: originally set
 to using mysql , but instead the default derby is getting used. Then tried
 changing  hive.metastore.local between true and false: no difference in
 behavior - just going to /tmp/$USER for creating the derby no matter what.

 I wondered whether hive-site.xml maybe has a syntax error and were
 getting ignored: so removed everything except
 configuration/configuration:  still no

 This is the totally simplified hive-site.xml: it just has enough to try
 to see if it is actually being read/applied (and it is not..):


 steve@mithril:/shared/hive/conf$ cat hive-site.xml
 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?

 configuration

 !-- Hive Execution Parameters --
 property
   namehive.metastore.local/name
   valuefalse/value
 /property

 property
 namehive.exec.scratchdir/name
 value/tmp/hive/hive-${user.name}/value
 descriptionScratch space for Hive jobs/description
 /property

 property
   namehive.hwi.war.file/name
   value/shared/hive/lib/hive-hwi-0.9.0.war/value
   descriptionThis is the WAR file with the jsp content for Hive Web
 Interface/description
 /property

 /configuration


 After running some DDL in hive, for example, no files are created
 underneath /tmp/hive  (instead they are going to /tmp/$USER which is
 default - as if the custom hive-site.xml never existed.







hive-site.xml not found on classpath

2012-11-29 Thread Stephen Boesch
I am seeing the following message in the logs (which are in the wrong place
under /tmp..)

hive-site.xml not found on classpath

My hive-site.xml is under the standard location  $HIVE_HOME/conf so this
should not happen.

Now some posts have talked that the HADOOP_CLASSPATH was mangled.  Mine is
not..

So what is the underlying issue here?

Thanks

stephenb


Re: hive-site.xml not found on classpath

2012-11-29 Thread Stephen Boesch
i am running under user steve.  the latest log (where this shows up ) is
 /tmp/steve/hive.log


2012/11/29 Viral Bajaria viral.baja...@gmail.com

 You are seeing this error when you run the hive cli or in the tasktracker
 logs when you run a query ?

 On Thu, Nov 29, 2012 at 12:42 AM, Stephen Boesch java...@gmail.comwrote:


 I am seeing the following message in the logs (which are in the wrong
 place under /tmp..)

 hive-site.xml not found on classpath

 My hive-site.xml is under the standard location  $HIVE_HOME/conf so this
 should not happen.

 Now some posts have talked that the HADOOP_CLASSPATH was mangled.  Mine
 is not..

 So what is the underlying issue here?

 Thanks

 stephenb





Re: hive-site.xml not found on classpath

2012-11-29 Thread Stephen Boesch
Yes.


2012/11/29 Shreepadma Venugopalan shreepa...@cloudera.com

 Are you seeing this message when your bring up the standalone hive cli by
 running 'hive'?


 On Thu, Nov 29, 2012 at 12:56 AM, Stephen Boesch java...@gmail.comwrote:

 i am running under user steve.  the latest log (where this shows up ) is
  /tmp/steve/hive.log


 2012/11/29 Viral Bajaria viral.baja...@gmail.com

 You are seeing this error when you run the hive cli or in the
 tasktracker logs when you run a query ?

 On Thu, Nov 29, 2012 at 12:42 AM, Stephen Boesch java...@gmail.comwrote:


 I am seeing the following message in the logs (which are in the wrong
 place under /tmp..)

  hive-site.xml not found on classpath

 My hive-site.xml is under the standard location  $HIVE_HOME/conf so
 this should not happen.

 Now some posts have talked that the HADOOP_CLASSPATH was mangled.  Mine
 is not..

 So what is the underlying issue here?

 Thanks

 stephenb







Re: hive-site.xml not found on classpath

2012-11-29 Thread Stephen Boesch
thought i mentioned in the posts those were already set and verified.. but
yes in any case that's first thing looked at.

steve@mithril:~$ echo $HIVE_CONF_DIR
/shared/hive/conf
steve@mithril:~$ echo $HIVE_HOME
/shared/hive


2012/11/29 kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com

 Have you tried setting HIVE_HOME and HIVE_CONF_DIR?


 On Thu, Nov 29, 2012 at 2:46 PM, Stephen Boesch java...@gmail.com wrote:

 Yes.


 2012/11/29 Shreepadma Venugopalan shreepa...@cloudera.com

 Are you seeing this message when your bring up the standalone hive cli
 by running 'hive'?


 On Thu, Nov 29, 2012 at 12:56 AM, Stephen Boesch java...@gmail.comwrote:

 i am running under user steve.  the latest log (where this shows up )
 is  /tmp/steve/hive.log


 2012/11/29 Viral Bajaria viral.baja...@gmail.com

 You are seeing this error when you run the hive cli or in the
 tasktracker logs when you run a query ?

 On Thu, Nov 29, 2012 at 12:42 AM, Stephen Boesch java...@gmail.comwrote:


 I am seeing the following message in the logs (which are in the wrong
 place under /tmp..)

  hive-site.xml not found on classpath

 My hive-site.xml is under the standard location  $HIVE_HOME/conf so
 this should not happen.

 Now some posts have talked that the HADOOP_CLASSPATH was mangled.
  Mine is not..

 So what is the underlying issue here?

 Thanks

 stephenb








 --
 Swarnim



Re: hive-site.xml not found on classpath

2012-11-29 Thread Stephen Boesch
Yes i do mean the log is in the wrong location, since it was set to a
persistent path in the $HIVE_CONF_DIR/lhive-log4j.properties.

None of the files in that directory appear to be picked up properly:
neither the hive-site.xml nor log4j.properties.

I have put echo statements into the 'hive and the hive-config.sh  shell
scripts and the echo statements prove that  HIVE_CONF_DIR is set properly:
 /shared/hive/conf

But even so the following problems occur:

   - the message hive-site.xml is not found in the classpath
   - none of the hive-site.xml values are taking properly
   - the log4j.properties in that same directory is not taking effect.




2012/11/29 Bing Li sarah.lib...@gmail.com

 Hi, Stephen
 what did you mean the wrong place under /tmp in
 I am seeing the following message in the logs (which are in the wrong
 place under /tmp..) ?

 Did you mean that you set a different log dir but it didn't work?

 the log dir should be set in conf/hive-log4j.properties,
 conf/hive-exec-log4j.properties
 and you can try to reset HIVE_CONF_DIR in conf/hive-env.sh with ‘export
 command.

 - Bing


 2012/11/30 Stephen Boesch java...@gmail.com

 thought i mentioned in the posts those were already set and verified..
 but yes in any case that's first thing looked at.

 steve@mithril:~$ echo $HIVE_CONF_DIR
 /shared/hive/conf
 steve@mithril:~$ echo $HIVE_HOME
 /shared/hive


 2012/11/29 kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com

 Have you tried setting HIVE_HOME and HIVE_CONF_DIR?


 On Thu, Nov 29, 2012 at 2:46 PM, Stephen Boesch java...@gmail.comwrote:

 Yes.


 2012/11/29 Shreepadma Venugopalan shreepa...@cloudera.com

 Are you seeing this message when your bring up the standalone hive cli
 by running 'hive'?


 On Thu, Nov 29, 2012 at 12:56 AM, Stephen Boesch java...@gmail.comwrote:

 i am running under user steve.  the latest log (where this shows up )
 is  /tmp/steve/hive.log


 2012/11/29 Viral Bajaria viral.baja...@gmail.com

 You are seeing this error when you run the hive cli or in the
 tasktracker logs when you run a query ?

 On Thu, Nov 29, 2012 at 12:42 AM, Stephen Boesch 
 java...@gmail.comwrote:


 I am seeing the following message in the logs (which are in the
 wrong place under /tmp..)

  hive-site.xml not found on classpath

 My hive-site.xml is under the standard location  $HIVE_HOME/conf so
 this should not happen.

 Now some posts have talked that the HADOOP_CLASSPATH was mangled.
  Mine is not..

 So what is the underlying issue here?

 Thanks

 stephenb








 --
 Swarnim






Re: hive-site.xml not found on classpath

2012-11-30 Thread Stephen Boesch
running 0.9.0 (you can see it from the classpath shown below);

steve@mithril:/shared/cdh4$ echo $HIVE_CONF_DIR
/shared/hive/conf
steve@mithril:/shared/cdh4$ ls -l $HIVE_CONF_DIR
total 152
-rw-r--r-- 1 steve steve 46053 2011-12-13 00:36 hive-default.xml.template
-rw-r--r-- 1 steve steve  1615 2012-11-13 23:37 hive-env.bullshit.sh
-rw-r--r-- 1 steve steve  1671 2012-11-28 01:43 hive-env.sh
-rw-r--r-- 1 steve steve  1593 2011-12-13 00:36 hive-env.sh.template
-rw-r--r-- 1 steve steve  1637 2011-12-13 00:36
hive-exec-log4j.properties.template
-rw-r--r-- 1 root  root   2056 2012-11-28 01:38 hive-log4j.properties
-rw-r--r-- 1 steve steve  2056 2012-03-25 12:49
hive-log4j.properties.template
-rw-r--r-- 1 steve steve  4415 2012-11-25 23:02 hive-site.xml
steve@mithril:/shared/cdh4$ echo $HIVE_HOME
/shared/hive
steve@mithril:/shared/cdh4$ echo $(which hive)
/shared/hive/bin/hive

also you can see the hive/conf is the first entry

After adding the debug statement:

classpath=*/shared/hive/conf:*
/shared/hive/lib/antlr-runtime-3.0.1.jar:/shared/hive/lib/commons-cli-1.2.jar:/shared/hive/lib/commons-codec-1.3.jar:/shared/hive/lib/commons-collections-3.2.1.jar:/shared/hive/lib/commons-dbcp-1.4.jar:/shared/hive/lib/commons-lang-2.4.jar:/shared/hive/lib/commons-logging-1.0.4.jar:/shared/hive/lib/commons-logging-api-1.0.4.jar:/shared/hive/lib/commons-pool-1.5.4.jar:/shared/hive/lib/datanucleus-connectionpool-2.0.3.jar:/shared/hive/lib/datanucleus-core-2.0.3.jar:/shared/hive/lib/datanucleus-enhancer-2.0.3.jar:/shared/hive/lib/datanucleus-rdbms-2.0.3.jar:/shared/hive/lib/derby-10.4.2.0.jar:/shared/hive/lib/guava-r09.jar:/shared/hive/lib/hbase-0.92.0.jar:/shared/hive/lib/hbase-0.92.0-tests.jar:/shared/hive/lib/hive-builtins-0.9.0.jar:/shared/hive/lib/hive-cli-0.9.0.jar:/shared/hive/lib/hive-common-0.9.0.jar:/shared/hive/lib/hive-contrib-0.9.0.jar:/shared/hive/lib/hive_contrib.jar:/shared/hive/lib/hive-exec-0.9.0.jar:/shared/hive/lib/hive-hbase-handler-0.9.0.jar:/shared/hive/lib/hive-hwi-0.9.0.jar:/shared/hive/lib/hive-jdbc-0.9.0.jar:/shared/hive/lib/hive-metastore-0.9.0.jar:/shared/hive/lib/hive-pdk-0.9.0.jar:/shared/hive/lib/hive-serde-0.9.0.jar:/shared/hive/lib/hive-service-0.9.0.jar:/shared/hive/lib/hive-shims-0.9.0.jar:/shared/hive/lib/jackson-core-asl-1.8.8.jar:/shared/hive/lib/jackson-jaxrs-1.8.8.jar:/shared/hive/lib/jackson-mapper-asl-1.8.8.jar:/shared/hive/lib/jackson-xc-1.8.8.jar:/shared/hive/lib/JavaEWAH-0.3.2.jar:/shared/hive/lib/jdo2-api-2.3-ec.jar:/shared/hive/lib/jline-0.9.94.jar:/shared/hive/lib/json-20090211.jar:/shared/hive/lib/libfb303-0.7.0.jar:/shared/hive/lib/libfb303.jar:/shared/hive/lib/libthrift-0.7.0.jar:/shared/hive/lib/libthrift.jar:/shared/hive/lib/log4j-1.2.16.jar:/shared/hive/lib/mysql-connector-java-5.1.18-bin.jar:/shared/hive/lib/slf4j-api-1.6.1.jar:/shared/hive/lib/slf4j-log4j12-1.6.1.jar:/shared/hive/lib/stringtemplate-3.1-b1.jar:/shared/hive/lib/zookeeper-3.4.3.jar:


But even so:

   - the log dir is still wrong (writing to /tmp/${user}/hive.log instead
   of $HIVE_HOME/logs)
   - the following message in the log file

2012-11-30 00:12:31,775 WARN  conf.HiveConf
(HiveConf.java:clinit(70)) - *hive-site.xml
not found on CLASSPATH*
* *
*
*




2012/11/30 Bing Li sarah.lib...@gmail.com

 which version of hive do you use?

 Could you try to add the following debug line in bin/hive before hive real
 executes, and see the result?

 *echo CLASSPATH=$CLASSPATH*

 if [ $TORUN =  ]; then
echo Service $SERVICE not found
echo Available Services: $SERVICE_LIST
exit 7
 else
$TORUN $@
 fi

 The version I used is 0.9.0



 2012/11/30 Stephen Boesch java...@gmail.com

 Yes i do mean the log is in the wrong location, since it was set to a
 persistent path in the $HIVE_CONF_DIR/lhive-log4j.properties.

 None of the files in that directory appear to be picked up properly:
 neither the hive-site.xml nor log4j.properties.

 I have put echo statements into the 'hive and the hive-config.sh  shell
 scripts and the echo statements prove that  HIVE_CONF_DIR is set properly:
  /shared/hive/conf

 But even so the following problems occur:

- the message hive-site.xml is not found in the classpath
- none of the hive-site.xml values are taking properly
- the log4j.properties in that same directory is not taking effect.




 2012/11/29 Bing Li sarah.lib...@gmail.com

 Hi, Stephen
 what did you mean the wrong place under /tmp in
 I am seeing the following message in the logs (which are in the wrong
 place under /tmp..) ?

 Did you mean that you set a different log dir but it didn't work?

 the log dir should be set in conf/hive-log4j.properties,
 conf/hive-exec-log4j.properties
 and you can try to reset HIVE_CONF_DIR in conf/hive-env.sh with ‘export
 command.

 - Bing


 2012/11/30 Stephen Boesch java...@gmail.com

 thought i mentioned in the posts those were already set and verified..
 but yes in any case that's first thing looked at.

 steve@mithril:~$ echo $HIVE_CONF_DIR
 /shared/hive/conf

Re: hive-site.xml not found on classpath

2012-12-09 Thread Stephen Boesch
The first element of the classpath is the right one already.. but I STILL
get the hive-site.xml is not found in classpath.  Only hive gives me
issues.  hdfs, mapred, hbase are all running fine.

HADOOP_CLASSPATH=:*/shared/hive/conf:*
/shared/hive/lib/antlr-runtime-3.0.1.jar:/shared/hive/lib/commons-cli-1.2.jar:/shared/hive/lib/commons-codec-1.3.jar:/shared/hive/lib/commons-collections-3.2.1.jar:/shared/hive/lib/commons-dbcp-1.4.jar:/shared/hive/lib/commons-lang-2.4.jar:/shared/hive/lib/commons-logging-1.0.4.jar:/shared/hive/lib/commons-logging-api-1.0.4.jar:/shared/hive/lib/commons-pool-1.5.4.jar:/shared/hive/lib/datanucleus-connectionpool-2.0.3.jar:/shared/hive/lib/datanucleus-core-2.0.3.jar:/shared/hive/lib/datanucleus-enhancer-2.0.3.jar:/shared/hive/lib/datanucleus-rdbms-2.0.3.jar:/shared/hive/lib/derby-10.4.2.0.jar:/shared/hive/lib/guava-r09.jar:/shared/hive/lib/hbase-0.92.0.jar:/shared/hive/lib/hbase-0.92.0-tests.jar:/shared/hive/lib/hive-builtins-0.9.0.jar:/shared/hive/lib/hive-cli-0.9.0.jar:/shared/hive/lib/hive-common-0.9.0.jar:/shared/hive/lib/hive-contrib-0.9.0.jar:/shared/hive/lib/hive_contrib.jar:/shared/hive/lib/hive-exec-0.9.0.jar:/shared/hive/lib/hive-hbase-handler-0.9.0.jar:/shared/hive/lib/hive-hwi-0.9.0.jar:/shared/hive/lib/hive-jdbc-0.9.0.jar:/shared/hive/lib/hive-metastore-0.9.0.jar:/shared/hive/lib/hive-pdk-0.9.0.jar:/shared/hive/lib/hive-serde-0.9.0.jar:/shared/hive/lib/hive-service-0.9.0.jar:/shared/hive/lib/hive-shims-0.9.0.jar:/shared/hive/lib/jackson-core-asl-1.8.8.jar:/shared/hive/lib/jackson-jaxrs-1.8.8.jar:/shared/hive/lib/jackson-mapper-asl-1.8.8.jar:/shared/hive/lib/jackson-xc-1.8.8.jar:/shared/hive/lib/JavaEWAH-0.3.2.jar:/shared/hive/lib/jdo2-api-2.3-ec.jar:/shared/hive/lib/jline-0.9.94.

steve@mithril:/shared/hive/bin$ *ls -lrta /shared/hive/conf/hive-site.xml*
-rw-r--r-- 1 steve steve 4415 2012-11-25 23:02
/shared/hive/conf/hive-site.xml



2012/11/30 Lauren Yang lauren.y...@microsoft.com

  You can see if the classpath is being passed correctly to hadoop by
 putting in an echo statement around line 150 of the hive cli script where
 it passes the CLASSPATH variable to HADOOP_CLASSPATH.

 # pass classpath to hadoop

 export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${CLASSPATH}

 ** **

 You could also echo the classpath in the hadoop script (in your
 HADOOP_HOME\bin directory) to see if the classpath is being passed
 correctly to the time when the cli jar is invoked.

 ** **

 As far as the logs location, if this is not set in your hive-site.xml, you
 can set it by passing  in HIVE_OPTS when you invoke the command line.

 ** **

 Like so:

 EXPORT HIVE_OPTS= -hiveconf hive.log.dir=$ HIVE_HOME\logs

 Then run “hive”

 ** **

 Or:

 Run “hive --hiveconf hive.log.dir=$ HIVE_HOME\logs”

 ** **

 Thanks,

 Lauren

 ** **

 ** **

 *From:* Stephen Boesch [mailto:java...@gmail.com]
 *Sent:* Friday, November 30, 2012 12:16 AM
 *To:* user@hive.apache.org
 *Subject:* Re: hive-site.xml not found on classpath

 ** **

 running 0.9.0 (you can see it from the classpath shown below);

 ** **

 steve@mithril:/shared/cdh4$ echo $HIVE_CONF_DIR

 /shared/hive/conf

 steve@mithril:/shared/cdh4$ ls -l $HIVE_CONF_DIR

 total 152

 -rw-r--r-- 1 steve steve 46053 2011-12-13 00:36 hive-default.xml.template*
 ***

 -rw-r--r-- 1 steve steve  1615 2012-11-13 23:37 hive-env.bullshit.sh

 -rw-r--r-- 1 steve steve  1671 2012-11-28 01:43 hive-env.sh

 -rw-r--r-- 1 steve steve  1593 2011-12-13 00:36 hive-env.sh.template

 -rw-r--r-- 1 steve steve  1637 2011-12-13 00:36
 hive-exec-log4j.properties.template

 -rw-r--r-- 1 root  root   2056 2012-11-28 01:38 hive-log4j.properties

 -rw-r--r-- 1 steve steve  2056 2012-03-25 12:49
 hive-log4j.properties.template

 -rw-r--r-- 1 steve steve  4415 2012-11-25 23:02 hive-site.xml

 steve@mithril:/shared/cdh4$ echo $HIVE_HOME

 /shared/hive

 steve@mithril:/shared/cdh4$ echo $(which hive)

 /shared/hive/bin/hive

 ** **

 also you can see the hive/conf is the first entry

 ** **

 After adding the debug statement: 

 ** **

 classpath=*/shared/hive/conf:*
 /shared/hive/lib/antlr-runtime-3.0.1.jar:/shared/hive/lib/commons-cli-1.2.jar:/shared/hive/lib/commons-codec-1.3.jar:/shared/hive/lib/commons-collections-3.2.1.jar:/shared/hive/lib/commons-dbcp-1.4.jar:/shared/hive/lib/commons-lang-2.4.jar:/shared/hive/lib/commons-logging-1.0.4.jar:/shared/hive/lib/commons-logging-api-1.0.4.jar:/shared/hive/lib/commons-pool-1.5.4.jar:/shared/hive/lib/datanucleus-connectionpool-2.0.3.jar:/shared/hive/lib/datanucleus-core-2.0.3.jar:/shared/hive/lib/datanucleus-enhancer-2.0.3.jar:/shared/hive/lib/datanucleus-rdbms-2.0.3.jar:/shared/hive/lib/derby-10.4.2.0.jar:/shared/hive/lib/guava-r09.jar:/shared/hive/lib/hbase-0.92.0.jar:/shared/hive/lib/hbase-0.92.0-tests.jar:/shared/hive/lib/hive-builtins-0.9.0.jar:/shared/hive/lib/hive-cli-0.9.0.jar:/shared/hive/lib/hive-common

Re: hive-site.xml not found on classpath

2012-12-09 Thread Stephen Boesch
I ended up patching the HiveConf.java  . If  hive-site.xml were not found
on the classpath then:

   - an   o.a.h.fs.Path object is created from
System.getenv(HIVE_CONF_DIR) + File.seperator + hive-site.xml
   - the Path is sent to the base class Configuration.addResource - whiich
   btw accepts either resourceURL's or Path objects

the problem was resolved  with that change to HiveConf.java:  the
hive-site.xml is loaded properly via the Path (instead of URL)  and as a
consequence the  jdoConnectionURL and related properties for the mysql
metastore are loaded allowing the metadata to be stored in mysql





2012/12/9 Stephen Boesch java...@gmail.com

 The first element of the classpath is the right one already.. but I STILL
 get the hive-site.xml is not found in classpath.  Only hive gives me
 issues.  hdfs, mapred, hbase are all running fine.

 HADOOP_CLASSPATH=:*/shared/hive/conf:*
 /shared/hive/lib/antlr-runtime-3.0.1.jar:/shared/hive/lib/commons-cli-1.2.jar:/shared/hive/lib/commons-codec-1.3.jar:/shared/hive/lib/commons-collections-3.2.1.jar:/shared/hive/lib/commons-dbcp-1.4.jar:/shared/hive/lib/commons-lang-2.4.jar:/shared/hive/lib/commons-logging-1.0.4.jar:/shared/hive/lib/commons-logging-api-1.0.4.jar:/shared/hive/lib/commons-pool-1.5.4.jar:/shared/hive/lib/datanucleus-connectionpool-2.0.3.jar:/shared/hive/lib/datanucleus-core-2.0.3.jar:/shared/hive/lib/datanucleus-enhancer-2.0.3.jar:/shared/hive/lib/datanucleus-rdbms-2.0.3.jar:/shared/hive/lib/derby-10.4.2.0.jar:/shared/hive/lib/guava-r09.jar:/shared/hive/lib/hbase-0.92.0.jar:/shared/hive/lib/hbase-0.92.0-tests.jar:/shared/hive/lib/hive-builtins-0.9.0.jar:/shared/hive/lib/hive-cli-0.9.0.jar:/shared/hive/lib/hive-common-0.9.0.jar:/shared/hive/lib/hive-contrib-0.9.0.jar:/shared/hive/lib/hive_contrib.jar:/shared/hive/lib/hive-exec-0.9.0.jar:/shared/hive/lib/hive-hbase-handler-0.9.0.jar:/shared/hive/lib/hive-hwi-0.9.0.jar:/shared/hive/lib/hive-jdbc-0.9.0.jar:/shared/hive/lib/hive-metastore-0.9.0.jar:/shared/hive/lib/hive-pdk-0.9.0.jar:/shared/hive/lib/hive-serde-0.9.0.jar:/shared/hive/lib/hive-service-0.9.0.jar:/shared/hive/lib/hive-shims-0.9.0.jar:/shared/hive/lib/jackson-core-asl-1.8.8.jar:/shared/hive/lib/jackson-jaxrs-1.8.8.jar:/shared/hive/lib/jackson-mapper-asl-1.8.8.jar:/shared/hive/lib/jackson-xc-1.8.8.jar:/shared/hive/lib/JavaEWAH-0.3.2.jar:/shared/hive/lib/jdo2-api-2.3-ec.jar:/shared/hive/lib/jline-0.9.94.

 steve@mithril:/shared/hive/bin$ *ls -lrta /shared/hive/conf/hive-site.xml*
 -rw-r--r-- 1 steve steve 4415 2012-11-25 23:02
 /shared/hive/conf/hive-site.xml



 2012/11/30 Lauren Yang lauren.y...@microsoft.com

  You can see if the classpath is being passed correctly to hadoop by
 putting in an echo statement around line 150 of the hive cli script where
 it passes the CLASSPATH variable to HADOOP_CLASSPATH.

 # pass classpath to hadoop

 export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${CLASSPATH}

 ** **

 You could also echo the classpath in the hadoop script (in your
 HADOOP_HOME\bin directory) to see if the classpath is being passed
 correctly to the time when the cli jar is invoked.

 ** **

 As far as the logs location, if this is not set in your hive-site.xml,
 you can set it by passing  in HIVE_OPTS when you invoke the command line.
 

 ** **

 Like so:

 EXPORT HIVE_OPTS= -hiveconf hive.log.dir=$ HIVE_HOME\logs

 Then run “hive”

 ** **

 Or:

 Run “hive --hiveconf hive.log.dir=$ HIVE_HOME\logs”

 ** **

 Thanks,

 Lauren

 ** **

 ** **

 *From:* Stephen Boesch [mailto:java...@gmail.com]
 *Sent:* Friday, November 30, 2012 12:16 AM
 *To:* user@hive.apache.org
 *Subject:* Re: hive-site.xml not found on classpath

 ** **

 running 0.9.0 (you can see it from the classpath shown below);

 ** **

 steve@mithril:/shared/cdh4$ echo $HIVE_CONF_DIR

 /shared/hive/conf

 steve@mithril:/shared/cdh4$ ls -l $HIVE_CONF_DIR

 total 152

 -rw-r--r-- 1 steve steve 46053 2011-12-13 00:36 hive-default.xml.template
 

 -rw-r--r-- 1 steve steve  1615 2012-11-13 23:37 hive-env.bullshit.sh

 -rw-r--r-- 1 steve steve  1671 2012-11-28 01:43 hive-env.sh

 -rw-r--r-- 1 steve steve  1593 2011-12-13 00:36 hive-env.sh.template

 -rw-r--r-- 1 steve steve  1637 2011-12-13 00:36
 hive-exec-log4j.properties.template

 -rw-r--r-- 1 root  root   2056 2012-11-28 01:38 hive-log4j.properties

 -rw-r--r-- 1 steve steve  2056 2012-03-25 12:49
 hive-log4j.properties.template

 -rw-r--r-- 1 steve steve  4415 2012-11-25 23:02 hive-site.xml

 steve@mithril:/shared/cdh4$ echo $HIVE_HOME

 /shared/hive

 steve@mithril:/shared/cdh4$ echo $(which hive)

 /shared/hive/bin/hive

 ** **

 also you can see the hive/conf is the first entry

 ** **

 After adding the debug statement: 

 ** **

 classpath=*/shared/hive/conf:*
 /shared/hive/lib/antlr-runtime-3.0.1.jar:/shared/hive/lib/commons-cli-1.2.jar:/shared/hive/lib/commons-codec-1.3.jar:/shared/hive/lib

Re: ROW_NUMBER() equivalent in Hive

2013-02-21 Thread Stephen Boesch
Hi Ashutosh,
   I am interested / reviewing your windowing feature.  Can you be more
specific about which (a) tests and (b) src files constitute your additions
(there are lots of files there ;)  )

thanks

stephen boesch


2013/2/21 Ashutosh Chauhan hashut...@apache.org

 Kumar,

 If you are willing to be on bleeding edge, this and many other
 partitioning and windowing functionality some of us are developing in a
 branch over at:
 https://svn.apache.org/repos/asf/hive/branches/ptf-windowing
 Check out this branch, build hive and than you can have row_number()
 functionality. Look in
 ql/src/test/queries/clientpositive/ptf_general_queries.q which has about 60
 or so example queries demonstrating various capabilities which we have
 already working (including row_number).
 We hope to have this branch merged in trunk soon.

 Hope it helps,
 Ashutosh
 On Wed, Feb 20, 2013 at 11:33 PM, kumar mr kumar...@aol.com wrote:

 Hi,

  This is Kumar, and this is my first question in this group.

  I have a requirement to implement ROW_NUMBER() from Teradata in Hive
 where partitioning happens on multiple columns along with multiple column
 ordering.
 It can be easily implemented in Hadoop MR, but I have to do in Hive. By
 doing in UDF can assign same rank to grouping key considering dataset is
 small, but ordering need to be done in prior step.
 Can we do this in lot simpler way?

  Thanks in advance.

  Regards,
 Kumar





No such file or directory error on simple query

2013-03-02 Thread Stephen Boesch
I am struggling with a no such file or directory exception  when running
a simple query in hive.   It is unfortunate that the actual path  were not
included with the stacktrace: the following is all that is provided.

I have a query that fails with the following error when done as   hive -e
select * from table'. But it works properly when done within the hive
shell.  But at the same time, doing hive select * from table2; fails
with the same error message.

I am also seeing this error both for hdfs files and for s3 files.  Without
any path information it is  very difficult and time consuming to track this
down.

Any pointers appreciated.


Automatically selecting local only mode for query
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=number
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=number
In order to set a constant number of reducers:
  set mapred.reduce.tasks=number
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please
use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties
files.
Execution log at:
/tmp/impala/impala_20130302095252_79ce9404-6af7-405b-8b06-849fe6c5328d.log
ENOENT: No such file or directory
at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
at
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:568)
at
org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:411)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:501)
at
org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:733)
at
org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:692)
at org.apache.hadoop.mapred.JobClient.access$400(JobClient.java:172)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:910)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:895)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:895)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:869)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:435)
at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:677)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Job Submission failed with exception
'org.apache.hadoop.io.nativeio.NativeIOException(No such file or directory)'
Execution failed with exit status: 1
Obtaining error information


Re: Hive QL - NOT IN, NOT EXIST

2013-05-05 Thread Stephen Boesch
@Peter  Does the query plan demonstrate that the 3Meg row table is being
map-joined and the 400M table streamed through? That is what you want: but
you might either need to fiddle with hints to get it to happen

Details:
Read uuids s of feed into  in-memory map on all nodes (mapjoin)
Stream the 400M message records through the in-memory maps, copying
id's from the all feed uuids  map to a  matched feed uuid's map for
entries that have matches in the messages

 Note: this way the 400M rows are only read once on the cluster.

You can see whether hive can manage this or if you write a custom m/r job
to do it.



2013/5/5 Peter Chu pete@outlook.com

 It works but it takes a very long time because the subqueries in NOT IN
 contains 400 million rows (the message table in the example) and the feed
 table contains 3 million rows.

 SELECT uuid from feed f WHERE f.uuid NOT IN (SELECT uuid FROM message);

  Date: Sun, 5 May 2013 20:25:15 -0700
  From: michaelma...@yahoo.com
  Subject: Re: Hive QL - NOT IN, NOT EXIST
  To: user@hive.apache.org

 
 
  --- On Sun, 5/5/13, Peter Chu pete@outlook.com wrote:
 
   I am wondering if there is any way to do this without resorting to
   using left outer join and finding nulls.
 
  I have found this to be an acceptable substitute. Is it not working for
 you?
 



Re: Hive QL - NOT IN, NOT EXIST

2013-05-06 Thread Stephen Boesch
Hi Peter,
   Looks like mapjoin does not work with outer join so streamtable is
instead a possible approach. You would stream the larger table through the
smaller one:

 can you see whether the following helps your perf issue?

select /*+ streamtable(message) */ f.uuid  from message m right outer join
feed f on m.uuid = f.uuid where m.uuid = null;




2013/5/5 Peter Chu pete@outlook.com

 Thanks, Stephen,

 I do not quite understand what you mean by Stream, specifically Stream
 the 400M message records through the in-memory maps.
 Can you please elaborate.

 Also, can you use MAPJOIN on left outer join?

 Peter

 --
 Date: Sun, 5 May 2013 21:44:37 -0700

 Subject: Re: Hive QL - NOT IN, NOT EXIST
 From: java...@gmail.com
 To: user@hive.apache.org



 @Peter  Does the query plan demonstrate that the 3Meg row table is being
 map-joined and the 400M table streamed through? That is what you want: but
 you might either need to fiddle with hints to get it to happen

 Details:
 Read uuids s of feed into  in-memory map on all nodes (mapjoin)
 Stream the 400M message records through the in-memory maps, copying
 id's from the all feed uuids  map to a  matched feed uuid's map for
 entries that have matches in the messages

  Note: this way the 400M rows are only read once on the cluster.

 You can see whether hive can manage this or if you write a custom m/r job
 to do it.



 2013/5/5 Peter Chu pete@outlook.com

 It works but it takes a very long time because the subqueries in NOT IN
 contains 400 million rows (the message table in the example) and the feed
 table contains 3 million rows.

 SELECT uuid from feed f WHERE f.uuid NOT IN (SELECT uuid FROM message);

  Date: Sun, 5 May 2013 20:25:15 -0700
  From: michaelma...@yahoo.com
  Subject: Re: Hive QL - NOT IN, NOT EXIST
  To: user@hive.apache.org

 
 
  --- On Sun, 5/5/13, Peter Chu pete@outlook.com wrote:
 
   I am wondering if there is any way to do this without resorting to
   using left outer join and finding nulls.
 
  I have found this to be an acceptable substitute. Is it not working for
 you?
 





Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
We have a few dozen files that need to be made available to all
mappers/reducers in the cluster while running  hive transformation steps .

It seems the add archive  does not make the entries unarchived and thus
available directly on the default file path - and that is what we are
looking for.

To illustrate:

   add file modelfile.1;
   add file modelfile.2;
   ..
add file modelfile.N;

  Then, our model that is invoked during the transformation step *does *have
correct access to its model files in the defaul path.

But .. those model files take low *minutes* to all load..

instead when we try:
   add archive  modelArchive.tgz.

The problem is the archive does not get exploded apparently ..

I have an archive for example that contains shell scripts under the hive
directory stored inside.  I am *not *able to access hive/my-shell-script.sh
 after adding the archive. Specifically the following fails:

$ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
-rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46
appminer/bin/launch-quixey_to_xml.sh

from (select transform (aappname,qappname)
*using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string) from
eqx ) o insert overwrite table c select o.aappname2, o.qappname2;

Cannot run program hive/parse_qx.py: java.io.IOException: error=2,
No such file or directory


Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
@Stephen:  given the  'relative' path for hive is from a local downloads
directory on each local tasktracker in the cluster,  it was my thought that
if the archive were actually being expanded then
somedir/somefileinthearchive  should work.  I will go ahead and test this
assumption.

In the meantime, is there any facility available in hive for making
archived files available to hive jobs?  archive or hadoop archive (har)
etc?


2013/6/20 Stephen Sprague sprag...@gmail.com

 what would be interesting would be to run a little experiment and find out
 what the default PATH is on your data nodes.  How much of a pain would it
 be to run a little python script to print to stderr the value of the
 environmental variable $PATH and $PWD (or the shell command 'pwd') ?

 that's of course going through normal channels of add file.

 the thing is given you're using a relative path hive/parse_qx.py  you
 need to know what the current directory is when the process runs on the
 data nodes.




 On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch java...@gmail.com wrote:


 We have a few dozen files that need to be made available to all
 mappers/reducers in the cluster while running  hive transformation steps .

 It seems the add archive  does not make the entries unarchived and thus
 available directly on the default file path - and that is what we are
 looking for.

 To illustrate:

add file modelfile.1;
add file modelfile.2;
..
 add file modelfile.N;

   Then, our model that is invoked during the transformation step *does *have
 correct access to its model files in the defaul path.

 But .. those model files take low *minutes* to all load..

 instead when we try:
add archive  modelArchive.tgz.

 The problem is the archive does not get exploded apparently ..

 I have an archive for example that contains shell scripts under the
 hive directory stored inside.  I am *not *able to access
 hive/my-shell-script.sh  after adding the archive. Specifically the
 following fails:

 $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
 -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46
 appminer/bin/launch-quixey_to_xml.sh

 from (select transform (aappname,qappname)
 *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string)
 from eqx ) o insert overwrite table c select o.aappname2, o.qappname2;

 Cannot run program hive/parse_qx.py: java.io.IOException: error=2, No such 
 file or directory







Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
thx for the tip on add file where file is directory. I will try that.


2013/6/20 Stephen Sprague sprag...@gmail.com

 i personally only know of adding a .jar file via add archive but my
 experience there is very limited.  i believe if you 'add file' and the file
 is a directory it'll recursively take everything underneath but i know of
 nothing that inflates or un tars things on the remote end automatically.

 i would 'add file' your python script and then within that untar your
 tarball to get at your model data. its just the matter of figuring out the
 path to that tarball that's kinda up in the air when its added as 'add
 file'.  Yeah. local downlooads directory.  What's the literal path is
 what i'd like to know. :)


 On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch java...@gmail.com wrote:


 @Stephen:  given the  'relative' path for hive is from a local downloads
 directory on each local tasktracker in the cluster,  it was my thought that
 if the archive were actually being expanded then
 somedir/somefileinthearchive  should work.  I will go ahead and test this
 assumption.

 In the meantime, is there any facility available in hive for making
 archived files available to hive jobs?  archive or hadoop archive (har)
 etc?


 2013/6/20 Stephen Sprague sprag...@gmail.com

 what would be interesting would be to run a little experiment and find
 out what the default PATH is on your data nodes.  How much of a pain would
 it be to run a little python script to print to stderr the value of the
 environmental variable $PATH and $PWD (or the shell command 'pwd') ?

 that's of course going through normal channels of add file.

 the thing is given you're using a relative path hive/parse_qx.py  you
 need to know what the current directory is when the process runs on the
 data nodes.




 On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch java...@gmail.comwrote:


 We have a few dozen files that need to be made available to all
 mappers/reducers in the cluster while running  hive transformation steps .

 It seems the add archive  does not make the entries unarchived and
 thus available directly on the default file path - and that is what we are
 looking for.

 To illustrate:

add file modelfile.1;
add file modelfile.2;
..
 add file modelfile.N;

   Then, our model that is invoked during the transformation step *does
 *have correct access to its model files in the defaul path.

 But .. those model files take low *minutes* to all load..

 instead when we try:
add archive  modelArchive.tgz.

 The problem is the archive does not get exploded apparently ..

 I have an archive for example that contains shell scripts under the
 hive directory stored inside.  I am *not *able to access
 hive/my-shell-script.sh  after adding the archive. Specifically the
 following fails:

 $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
 -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46
 appminer/bin/launch-quixey_to_xml.sh

 from (select transform (aappname,qappname)
 *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string)
 from eqx ) o insert overwrite table c select o.aappname2, o.qappname2;

 Cannot run program hive/parse_qx.py: java.io.IOException: error=2, No 
 such file or directory









Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
Stephen:  would you be willing to share an example of specifying a
directory as the  add file target?I have not seen this working

I have attempted to use it as follows:

*We will access a script within the hivetry directory located here:*
hive ! ls -l  /opt/am/ver/1.0/hive/hivetry/classifier_wf.py;
-rwxrwxr-x 1 hadoop hadoop 11241 Jun 18 19:37
/opt/am/ver/1.0/hive/hivetry/classifier_wf.py

*Add the directory  to hive:*
hive add file /opt/am/ver/1.0/hive/hivetry;
Added resource: /opt/am/ver/1.0/hive/hivetry

*Attempt to run transform query using that script:*
*
*
*Attempt one: use the script name unqualified:*

hivefrom (select transform (aappname,qappname) using
'classifier_wf.py' as (aappname2 string, qappname2 string) from eqx )
o insert overwrite table c select o.aappname2, o.qappname2;

(Failed:   Caused by: java.io.IOException: Cannot run program
classifier_wf.py: java.io.IOException: error=2, No such file or
directory)


*Attempt two: use the script name with the directory name prefix: *
hivefrom (select transform (aappname,qappname) using
'hive/classifier_wf.py' as (aappname2 string, qappname2 string) from
eqx ) o insert overwrite table c select o.aappname2, o.qappname2;

(Failed:   Caused by: java.io.IOException: Cannot run program
hive/classifier_wf.py: java.io.IOException: error=2, No such file or
directory)




2013/6/20 Stephen Sprague sprag...@gmail.com

 yeah.  the archive isn't unpacked on the remote side. I think add archive
 is mostly used for finding java packages since CLASSPATH will reference the
 archive (and as such there is no need to expand it.)


 On Thu, Jun 20, 2013 at 9:00 AM, Stephen Boesch java...@gmail.com wrote:

 thx for the tip on add file where file is directory. I will try
 that.


 2013/6/20 Stephen Sprague sprag...@gmail.com

 i personally only know of adding a .jar file via add archive but my
 experience there is very limited.  i believe if you 'add file' and the file
 is a directory it'll recursively take everything underneath but i know of
 nothing that inflates or un tars things on the remote end automatically.

 i would 'add file' your python script and then within that untar your
 tarball to get at your model data. its just the matter of figuring out the
 path to that tarball that's kinda up in the air when its added as 'add
 file'.  Yeah. local downlooads directory.  What's the literal path is
 what i'd like to know. :)


 On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch java...@gmail.comwrote:


 @Stephen:  given the  'relative' path for hive is from a local
 downloads directory on each local tasktracker in the cluster,  it was my
 thought that if the archive were actually being expanded then
 somedir/somefileinthearchive  should work.  I will go ahead and test this
 assumption.

 In the meantime, is there any facility available in hive for making
 archived files available to hive jobs?  archive or hadoop archive (har)
 etc?


 2013/6/20 Stephen Sprague sprag...@gmail.com

 what would be interesting would be to run a little experiment and find
 out what the default PATH is on your data nodes.  How much of a pain would
 it be to run a little python script to print to stderr the value of the
 environmental variable $PATH and $PWD (or the shell command 'pwd') ?

 that's of course going through normal channels of add file.

 the thing is given you're using a relative path hive/parse_qx.py
 you need to know what the current directory is when the process runs on
 the data nodes.




 On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch java...@gmail.comwrote:


 We have a few dozen files that need to be made available to all
 mappers/reducers in the cluster while running  hive transformation steps 
 .

 It seems the add archive  does not make the entries unarchived and
 thus available directly on the default file path - and that is what we 
 are
 looking for.

 To illustrate:

add file modelfile.1;
add file modelfile.2;
..
 add file modelfile.N;

   Then, our model that is invoked during the transformation step *does
 *have correct access to its model files in the defaul path.

 But .. those model files take low *minutes* to all load..

 instead when we try:
add archive  modelArchive.tgz.

 The problem is the archive does not get exploded apparently ..

 I have an archive for example that contains shell scripts under the
 hive directory stored inside.  I am *not *able to access
 hive/my-shell-script.sh  after adding the archive. Specifically the
 following fails:

 $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
 -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46
 appminer/bin/launch-quixey_to_xml.sh

 from (select transform (aappname,qappname)
 *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string)
 from eqx ) o insert overwrite table c select o.aappname2, o.qappname2;

 Cannot run program hive/parse_qx.py: java.io.IOException: error=2, No 
 such file or directory











Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
Good eyes Ramki!  thanks this directory in place of filename appears to
be working.  The script is getting loaded now using the Attempt two i.e.
 the hivetry/classification_wf.py as the script path.

thanks again.

stephenb


2013/6/20 Ramki Palle ramki.pa...@gmail.com

 In the *Attempt two, *are you not supposed to use hivetry as the
 directory?

 May be you should try giving the full path 
 /opt/am/ver/1.0/hive/hivetry/classifier_wf.py and see if it works.

 Regards,
 Ramki.


 On Thu, Jun 20, 2013 at 9:28 AM, Stephen Boesch java...@gmail.com wrote:


 Stephen:  would you be willing to share an example of specifying a
 directory as the  add file target?I have not seen this working

 I have attempted to use it as follows:

 *We will access a script within the hivetry directory located here:*
 hive ! ls -l  /opt/am/ver/1.0/hive/hivetry/classifier_wf.py;
 -rwxrwxr-x 1 hadoop hadoop 11241 Jun 18 19:37
 /opt/am/ver/1.0/hive/hivetry/classifier_wf.py

 *Add the directory  to hive:*
 hive add file /opt/am/ver/1.0/hive/hivetry;
 Added resource: /opt/am/ver/1.0/hive/hivetry

 *Attempt to run transform query using that script:*
 *
 *
 *Attempt one: use the script name unqualified:*

 hivefrom (select transform (aappname,qappname) using 'classifier_wf.py' 
 as (aappname2 string, qappname2 string) from eqx ) o insert overwrite table 
 c select o.aappname2, o.qappname2;


 (Failed:   Caused by: java.io.IOException: Cannot run program 
 classifier_wf.py: java.io.IOException: error=2, No such file or directory)


 *Attempt two: use the script name with the directory name prefix: *

 hivefrom (select transform (aappname,qappname) using 
 'hive/classifier_wf.py' as (aappname2 string, qappname2 string) from eqx ) o 
 insert overwrite table c select o.aappname2, o.qappname2;


 (Failed:   Caused by: java.io.IOException: Cannot run program 
 hive/classifier_wf.py: java.io.IOException: error=2, No such file or 
 directory)





 2013/6/20 Stephen Sprague sprag...@gmail.com

 yeah.  the archive isn't unpacked on the remote side. I think add
 archive is mostly used for finding java packages since CLASSPATH will
 reference the archive (and as such there is no need to expand it.)


 On Thu, Jun 20, 2013 at 9:00 AM, Stephen Boesch java...@gmail.comwrote:

 thx for the tip on add file where file is directory. I will try
 that.


 2013/6/20 Stephen Sprague sprag...@gmail.com

 i personally only know of adding a .jar file via add archive but my
 experience there is very limited.  i believe if you 'add file' and the 
 file
 is a directory it'll recursively take everything underneath but i know of
 nothing that inflates or un tars things on the remote end automatically.

 i would 'add file' your python script and then within that untar your
 tarball to get at your model data. its just the matter of figuring out the
 path to that tarball that's kinda up in the air when its added as 'add
 file'.  Yeah. local downlooads directory.  What's the literal path is
 what i'd like to know. :)


 On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch java...@gmail.comwrote:


 @Stephen:  given the  'relative' path for hive is from a local
 downloads directory on each local tasktracker in the cluster,  it was my
 thought that if the archive were actually being expanded then
 somedir/somefileinthearchive  should work.  I will go ahead and test this
 assumption.

 In the meantime, is there any facility available in hive for making
 archived files available to hive jobs?  archive or hadoop archive (har)
 etc?


 2013/6/20 Stephen Sprague sprag...@gmail.com

 what would be interesting would be to run a little experiment and
 find out what the default PATH is on your data nodes.  How much of a 
 pain
 would it be to run a little python script to print to stderr the value 
 of
 the environmental variable $PATH and $PWD (or the shell command 'pwd') ?

 that's of course going through normal channels of add file.

 the thing is given you're using a relative path hive/parse_qx.py
 you need to know what the current directory is when the process runs 
 on
 the data nodes.




 On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch 
 java...@gmail.comwrote:


 We have a few dozen files that need to be made available to all
 mappers/reducers in the cluster while running  hive transformation 
 steps .

 It seems the add archive  does not make the entries unarchived
 and thus available directly on the default file path - and that is 
 what we
 are looking for.

 To illustrate:

add file modelfile.1;
add file modelfile.2;
..
 add file modelfile.N;

   Then, our model that is invoked during the transformation step *does
 *have correct access to its model files in the defaul path.

 But .. those model files take low *minutes* to all load..

 instead when we try:
add archive  modelArchive.tgz.

 The problem is the archive does not get exploded apparently ..

 I have an archive for example that contains shell scripts under the
 hive directory stored

Re: hive query is very slow,why?

2013-07-18 Thread Stephen Boesch
one mapper.  how big is the table?


2013/7/18 ch huang justlo...@gmail.com

 i wait long time,no result ,why hive is so slow?

 hive select cookie,url,ip,source,vsid,token,residence,edate from
 hb_cookie_history where edate='1371398400500' and edate='1371400200500';
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks is set to 0 since there's no reduce operator
 Starting Job = job_1374138311742_0007, Tracking URL =
 http://CH22:8088/proxy/application_1374138311742_0007/http://ch22:8088/proxy/application_1374138311742_0007/
 Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1374138311742_0007



Any Scenarios in which Views impose performance penalties

2013-08-20 Thread Stephen Boesch
Views should theoretically not incur performance penalties: they simply
represent queries. Are there situtions that things are not that simple -
i.e. views may actually result in different exeucution plans than the
underlying sql?

Additionally, are there views-related bugs that we should be aware of that
would limit the occasions that we could use them?

Thanks for the pointers.


Re: Any Scenarios in which Views impose performance penalties

2013-08-20 Thread Stephen Boesch
Thanks v much Ricky.  Is this fixed in hive 0.11 - or going to be later
i.e. 0.12?


2013/8/20 Ricky Saltzer ri...@cloudera.com

 Although this is already fixed in the next and upcoming impala release,
 you might want to be aware of the following view limitation.

 https://issues.cloudera.org/browse/IMPALA-495
 On Aug 20, 2013 7:16 PM, Stephen Boesch java...@gmail.com wrote:

 Views should theoretically not incur performance penalties: they simply
 represent queries. Are there situtions that things are not that simple -
 i.e. views may actually result in different exeucution plans than the
 underlying sql?

 Additionally, are there views-related bugs that we should be aware of
 that would limit the occasions that we could use them?

 Thanks for the pointers.





Re: Any Scenarios in which Views impose performance penalties

2013-08-20 Thread Stephen Boesch
Thanks for the inputs Edward and Ricky.  I did look at the relevant source
code on the version we are using 0.9 and it does not appear there would be
any impact of views on the underlying sql - it does column aliasing and
checks  that the query partition columns match partition columns of the
underlying tables.Neither would lead to incorrect partition pruning.



2013/8/20 Ricky Saltzer ri...@cloudera.com

 My apologies, being on both a Hive and Impala mailing list can be
 confusing ;).


 On Tue, Aug 20, 2013 at 10:40 PM, Edward Capriolo 
 edlinuxg...@gmail.comwrote:

 Views are logical . The view is compiled and has no penalty over the
 standard query.


 On Tuesday, August 20, 2013, Ricky Saltzer ri...@cloudera.com wrote:
  Since this bug was in Impala's query planner, I'm sure Hive is
 unaffected.
 
  On Aug 20, 2013 10:15 PM, Stephen Boesch java...@gmail.com wrote:
 
  Thanks v much Ricky.  Is this fixed in hive 0.11 - or going to be
 later i.e. 0.12?
 
  2013/8/20 Ricky Saltzer ri...@cloudera.com
 
  Although this is already fixed in the next and upcoming impala
 release, you might want to be aware of the following view limitation.
 
  https://issues.cloudera.org/browse/IMPALA-495
 
  On Aug 20, 2013 7:16 PM, Stephen Boesch java...@gmail.com wrote:
 
  Views should theoretically not incur performance penalties: they
 simply represent queries. Are there situtions that things are not that
 simple - i.e. views may actually result in different exeucution plans than
 the underlying sql?
  Additionally, are there views-related bugs that we should be aware
 of that would limit the occasions that we could use them?
  Thanks for the pointers.
 
 
 




 --
 Ricky Saltzer
 Tools Developer
 http://www.cloudera.com





BNF for Hive Views

2013-08-25 Thread Stephen Boesch
It appears a bit challenging to find the BNF's for the hive DDL's.  After a
few google's the following popped up for cdh3 and only for a subset of
table creation's.


http://archive.cloudera.com/cdh/3/hive/language_manual/data-manipulation-statements.html

Is there an updated and more complete DDL BNF reference?  Although my
present need is for views (and specifically how to impose a schema on a
view), the BNF for other statements would also be helpful.

Thanks,

stephenb


Re: BNF for Hive Views

2013-08-25 Thread Stephen Boesch
I was already well familiar with the content of the links you provided.   I
have a specific question about the BNF for views (and potentially other
ddl/dml) that does not appear to be addressed  . Thanks.



2013/8/25 Lefty Leverenz leftylever...@gmail.com

 Let me introduce you to the Hive wiki.

- Hive wiki home page:
https://cwiki.apache.org/confluence/display/Hive/Home
- Language manual:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual
- DDL:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
- Views:

 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create%2FDropView
   - Excerpt:  A view's schema is frozen at the time the view is
   created; subsequent changes to underlying tables (e.g. adding a column)
   will not be reflected in the view's schema. If an underlying table is
   dropped or changed in an incompatible fashion, subsequent attempts to 
 query
   the invalid view will fail.
   - New in Hive 0.11:
   
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterViewAsSelect
- SELECT:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select

 The wiki is still a work in progress, but you'll find more DDL information
 than in the old Hive xdocs that Cloudera provides.  Everything in the xdocs
 is in the wiki now (except for some nifty headings in the CREATE TABLE
 section, which ought to be added to the wiki).

 -- Lefty


 On Sun, Aug 25, 2013 at 4:38 PM, Stephen Boesch java...@gmail.com wrote:


 It appears a bit challenging to find the BNF's for the hive DDL's.  After
 a few google's the following popped up for cdh3 and only for a subset of
 table creation's.



 http://archive.cloudera.com/cdh/3/hive/language_manual/data-manipulation-statements.html

 Is there an updated and more complete DDL BNF reference?  Although my
 present need is for views (and specifically how to impose a schema on a
 view), the BNF for other statements would also be helpful.

 Thanks,

 stephenb






Re: BNF for Hive Views

2013-08-25 Thread Stephen Boesch
yes i had read and re-read it.   I do have a specific reason for wishing to
view the bnf. thanks.


2013/8/25 Lefty Leverenz leftylever...@gmail.com

 Have you tried the Views chapter in the O'Reilly book Programming Hive
 by Rutherglen, Wampler, and Capriolo?

 -- Lefty



 On Mon, Aug 26, 2013 at 12:14 AM, Stephen Boesch java...@gmail.comwrote:

 I was already well familiar with the content of the links you provided.
 I have a specific question about the BNF for views (and potentially other
 ddl/dml) that does not appear to be addressed  . Thanks.



 2013/8/25 Lefty Leverenz leftylever...@gmail.com

 Let me introduce you to the Hive wiki.

- Hive wiki home page:
https://cwiki.apache.org/confluence/display/Hive/Home
- Language manual:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual
- DDL:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
- Views:

 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create%2FDropView
   - Excerpt:  A view's schema is frozen at the time the view is
   created; subsequent changes to underlying tables (e.g. adding a 
 column)
   will not be reflected in the view's schema. If an underlying table is
   dropped or changed in an incompatible fashion, subsequent attempts to 
 query
   the invalid view will fail.
   - New in Hive 0.11:
   
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterViewAsSelect
- SELECT:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select

 The wiki is still a work in progress, but you'll find more DDL
 information than in the old Hive xdocs that Cloudera provides.  Everything
 in the xdocs is in the wiki now (except for some nifty headings in the
 CREATE TABLE section, which ought to be added to the wiki).

 -- Lefty


 On Sun, Aug 25, 2013 at 4:38 PM, Stephen Boesch java...@gmail.comwrote:


 It appears a bit challenging to find the BNF's for the hive DDL's.
  After a few google's the following popped up for cdh3 and only for a
 subset of table creation's.



 http://archive.cloudera.com/cdh/3/hive/language_manual/data-manipulation-statements.html

 Is there an updated and more complete DDL BNF reference?  Although my
 present need is for views (and specifically how to impose a schema on a
 view), the BNF for other statements would also be helpful.

 Thanks,

 stephenb









Re: BNF for Hive Views

2013-08-25 Thread Stephen Boesch
The antlr file  (Hive.g ) is providing the info I need for this specific
case, but if BNF exists a pointer would still be helpful . Thx


2013/8/25 Stephen Boesch java...@gmail.com

 yes i had read and re-read it.   I do have a specific reason for wishing
 to view the bnf. thanks.


 2013/8/25 Lefty Leverenz leftylever...@gmail.com

 Have you tried the Views chapter in the O'Reilly book Programming Hive
 by Rutherglen, Wampler, and Capriolo?

 -- Lefty



 On Mon, Aug 26, 2013 at 12:14 AM, Stephen Boesch java...@gmail.comwrote:

 I was already well familiar with the content of the links you provided.
   I have a specific question about the BNF for views (and potentially other
 ddl/dml) that does not appear to be addressed  . Thanks.



 2013/8/25 Lefty Leverenz leftylever...@gmail.com

 Let me introduce you to the Hive wiki.

- Hive wiki home page:
https://cwiki.apache.org/confluence/display/Hive/Home
- Language manual:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual
- DDL:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
- Views:

 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create%2FDropView
   - Excerpt:  A view's schema is frozen at the time the view is
   created; subsequent changes to underlying tables (e.g. adding a 
 column)
   will not be reflected in the view's schema. If an underlying table is
   dropped or changed in an incompatible fashion, subsequent attempts 
 to query
   the invalid view will fail.
   - New in Hive 0.11:
   
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterViewAsSelect
- SELECT:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select

 The wiki is still a work in progress, but you'll find more DDL
 information than in the old Hive xdocs that Cloudera provides.  Everything
 in the xdocs is in the wiki now (except for some nifty headings in the
 CREATE TABLE section, which ought to be added to the wiki).

 -- Lefty


 On Sun, Aug 25, 2013 at 4:38 PM, Stephen Boesch java...@gmail.comwrote:


 It appears a bit challenging to find the BNF's for the hive DDL's.
  After a few google's the following popped up for cdh3 and only for a
 subset of table creation's.



 http://archive.cloudera.com/cdh/3/hive/language_manual/data-manipulation-statements.html

 Is there an updated and more complete DDL BNF reference?  Although my
 present need is for views (and specifically how to impose a schema on a
 view), the BNF for other statements would also be helpful.

 Thanks,

 stephenb










Pseudo column for the entire Line/Row ?

2013-08-30 Thread Stephen Boesch
I am writing a UDF that will perform validation on the input row and shall
require access to every column in the row (or alternatively to simply the
unparsed/pre-processed line).

Is there any way to achieve this?  Or will it be simply necessary to
declare an overloaded evaluate() method with a signature comprising every
field ?

Suggestion of alternative tactics also welcomed.


thanks

stephenb


Re: DISCUSS: Hive language manual to be source control managed

2013-09-01 Thread Stephen Boesch
Will this allow BNF's for the DDL / DML to be  provided and made up to date
 more readily ?


2013/9/1 Edward Capriolo edlinuxg...@gmail.com

 Over the past few weeks I have taken several looks over documents in our
 wiki.
 The page that strikes me as alarmingly poor is the:
 https://cwiki.apache.org/Hive/languagemanual.html

 This page has several critical broken links such as
 https://cwiki.apache.org/Hive/languagemanual-groupby.html
 https://cwiki.apache.org/Hive/languagemanual-transform.html

 The language manual used to be in decent shape. At times it had omissions
 or was not clear about what version something appeared it, but it was very
 usable.

 A long time ago I had began and completed moving the wiki documentation
 inside the project as xdoc. After completion, several had a problem with
 the xdocs approach. The main complaint was the xdoc approach was too
 cumbersome. (However we have basically had a 'turn over' and since that
 time I am one of the few active committers)

 The language manual is in very poor shape at the moment with broken links,
 incorrect content, incomplete content, and poor coverage of the actual
 languages. IMHO the attempts to crowd-source this documentation has failed.
 Having a good concise language manual is critical to the success and
 adoption of hive.

 I do not believe all of our documentation needs to be in xdoc (as in every
 udf, or every input format) but I believe the language manual surely does.

 Please review the current wiki and discuss the concept of moving the
 language manual to source control, or suggest other options.

 Thank you,
 Edward





Options for Loading Side Data / small files in UDF

2013-09-13 Thread Stephen Boesch
We have a UDF that is configured via a small properties file.  What are the
options for distributing the file for the task nodes?  Also we want to be
able to update the file frequently.

We are not running on AWS so S3 is not an option - and we do not have
access to NFS/other shared disk from the Mappers.

If the hive classes can access HDFS that would be likely most ideal - and
it would seem should be possible.  I am not clear how to do that - since
the standard hdfs api requires the  Configuration to be supplied - which is
not available.

Pointers appreciated.

stephenb


Re: Options for Loading Side Data / small files in UDF

2013-09-13 Thread Stephen Boesch
I should have mentioned:  we can not use the add file here because this
is running within a framework.   we need to use Java api's


2013/9/13 Jagat Singh jagatsi...@gmail.com

 Hi

 You can use distributed cache and hive add file command

 See here for example syntax


 http://stackoverflow.com/questions/15429040/add-multiple-files-to-distributed-cache-in-hive

 Regards,

 Jagat


 On Sat, Sep 14, 2013 at 9:57 AM, Stephen Boesch java...@gmail.com wrote:


 We have a UDF that is configured via a small properties file.  What are
 the options for distributing the file for the task nodes?  Also we want to
 be able to update the file frequently.

 We are not running on AWS so S3 is not an option - and we do not have
 access to NFS/other shared disk from the Mappers.

 If the hive classes can access HDFS that would be likely most ideal - and
 it would seem should be possible.  I am not clear how to do that - since
 the standard hdfs api requires the  Configuration to be supplied - which is
 not available.

 Pointers appreciated.

 stephenb





Re: Options for Loading Side Data / small files in UDF

2013-09-13 Thread Stephen Boesch
Hi Jagat,

There is no call to loading file from hdfs  in Edward's example (which I
had btw already seen).

I am looking into using getRequriedFiles()



2013/9/13 Jagat Singh jagatsi...@gmail.com

 Sorry i missed that

 Just check this example for accessing from API

 https://github.com/edwardcapriolo/hive-geoip/




 On Sat, Sep 14, 2013 at 10:12 AM, Stephen Boesch java...@gmail.comwrote:

 I should have mentioned:  we can not use the add file here because this
 is running within a framework.   we need to use Java api's


 2013/9/13 Jagat Singh jagatsi...@gmail.com

 Hi

 You can use distributed cache and hive add file command

 See here for example syntax


 http://stackoverflow.com/questions/15429040/add-multiple-files-to-distributed-cache-in-hive

 Regards,

 Jagat


 On Sat, Sep 14, 2013 at 9:57 AM, Stephen Boesch java...@gmail.comwrote:


 We have a UDF that is configured via a small properties file.  What are
 the options for distributing the file for the task nodes?  Also we want to
 be able to update the file frequently.

 We are not running on AWS so S3 is not an option - and we do not have
 access to NFS/other shared disk from the Mappers.

 If the hive classes can access HDFS that would be likely most ideal -
 and it would seem should be possible.  I am not clear how to do that -
 since the standard hdfs api requires the  Configuration to be supplied -
 which is not available.

 Pointers appreciated.

 stephenb







Loading data into partition taking seven times total of (map+reduce) on highly skewed data

2013-09-20 Thread Stephen Boesch
We have a small (3GB /280M rows) table with 435 partitions that is highly
skewed:  one partition has nearly 200M, two others have nearly 40M apiece,
then the remaining 432 have all together less than 1% of total table size.

So .. the skew is something to be addressed.  However - even give that -
why would the following occur?


Table Structure:

 # Partition Information
# col_name data_type   comment
 derived_create_dt   string   None

# Detailed Table Information
 ..
Protect Mode:   None
Retention:   0
 ..
Table Type: MANAGED_TABLE
Table Parameters:
 SORTBUCKETCOLSPREFIX TRUE
transient_lastDdlTime 1379678551

# Storage Information
SerDe Library:   org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
 InputFormat: org.apache.hadoop.hive.ql.io.RCFileInputFormat
OutputFormat:   org.apache.hadoop.hive.ql.io.RCFileOutputFormat
 Compressed: No
Num Buckets: 64
 Bucket Columns: [station_id]
Sort Columns:   [Order(col:station_id, order:1)]
 Storage Desc Params:
serialization.format 1

HIGHLY SKEWED data:  although
This particular load:
300M rows
 4GB
435 partitions
   Over 99% of data in just 3 out of the 435 partitons
2013-09-18 26733990
  2013-09-19 191634067
  2013-09-20 63790065



Map takes 10 min
Reduce 13 mins
Loading into partitions takes 3 hours 27 minutes


Re: Loading data into partition taking seven times total of (map+reduce) on highly skewed data

2013-09-20 Thread Stephen Boesch
Another detail:   ~400 mappers  64 reducers


2013/9/20 Stephen Boesch java...@gmail.com


 We have a small (3GB /280M rows) table with 435 partitions that is highly
 skewed:  one partition has nearly 200M, two others have nearly 40M apiece,
 then the remaining 432 have all together less than 1% of total table size.

 So .. the skew is something to be addressed.  However - even give that -
 why would the following occur?


 Table Structure:

  # Partition Information
 # col_name data_type   comment
  derived_create_dt   string   None

 # Detailed Table Information
  ..
 Protect Mode:   None
 Retention:   0
  ..
 Table Type: MANAGED_TABLE
 Table Parameters:
  SORTBUCKETCOLSPREFIX TRUE
 transient_lastDdlTime 1379678551

 # Storage Information
 SerDe Library:   org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
  InputFormat: org.apache.hadoop.hive.ql.io.RCFileInputFormat
 OutputFormat:   org.apache.hadoop.hive.ql.io.RCFileOutputFormat
  Compressed: No
 Num Buckets: 64
  Bucket Columns: [station_id]
 Sort Columns:   [Order(col:station_id, order:1)]
  Storage Desc Params:
 serialization.format 1

 HIGHLY SKEWED data:  although
 This particular load:
 300M rows
  4GB
 435 partitions
Over 99% of data in just 3 out of the 435 partitons
 2013-09-18 26733990
   2013-09-19 191634067
   2013-09-20 63790065



 Map takes 10 min
 Reduce 13 mins
 Loading into partitions takes 3 hours 27 minutes





Re: Predicate pushdown optimisation not working for ORC

2014-04-02 Thread Stephen Boesch
HI Abhay,
  What is the DDL for your test table?


2014-04-02 22:36 GMT-07:00 Abhay Bansal abhaybansal.1...@gmail.com:

 I am new to Hive, apologise for asking such a basic question.

 Following exercise was done with hive .12 and hadoop 0.20.203

 I created a ORC file form java, and pushed it into a table with the same
 schema. I checked the conf
 property 
 propertynamehive.optimize.ppd/namevaluetrue/value/property
 which should ideally use the ppd optimisation.

 I ran a query select sourceipv4address,sessionid,url from test where
 sourceipv4address=dummy;

 Just to see if the ppd optimization is working I checked the hadoop logs
 where I found

 ./userlogs/job_201404010833_0036/attempt_201404010833_0036_m_00_0/syslog:2014-04-03
 05:01:39,913 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: included
 column ids = 3,8,13
 ./userlogs/job_201404010833_0036/attempt_201404010833_0036_m_00_0/syslog:2014-04-03
 05:01:39,914 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: included
 columns names = sourceipv4address,sessionid,url
 ./userlogs/job_201404010833_0036/attempt_201404010833_0036_m_00_0/syslog:2014-04-03
 05:01:39,914 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: *No
 ORC pushdown predicate*

  I am not sure which part of it I missed. Any help would be appreciated.

 Thanks,
 -Abhay



More recent Hive-Hbase Integration info/docs

2014-07-10 Thread Stephen Boesch
The url for the hbase-hive integration:

https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration

has old versions:  Hbase 0.92.0  and hadoop 0.20.x

Are there any significant changes to these docs that anyone might (a) have
pointers to or (b) be able/willing to mention here as important updates?

Thanks.
stephenb


Re: How to clean up a table for which the underlying hdfs file no longer exists

2015-03-22 Thread Stephen Boesch
The folder exists: just not the file.  I tried both of Daniel's suggestions
and they ended up with same error afterwards.

2015-03-22 2:07 GMT-07:00 Wollert, Fabian fabian.woll...@zalando.de:

 can you create the specified folder (just leave it empty) and then delete
 again?

 Cheers
 Fabian

 2015-03-22 3:15 GMT+01:00 Stephen Boesch java...@gmail.com:


 There is a hive table for which the metadata points to a non-existing
 hdfs file.  Simply calling

 drop table mytable

 results in:

 Failed to load metadata for table: db.mytable
 Caused by TAbleLoadingException: Failed to load metadata for table: 
 db.mytable
 File does not exist:  hdfs://
 Caused by FileNotFoundException: File does not exist: hdfs:// ..


 So:  the file does not exist in hdfs , and it is not possible to remove
 the metadata for it directly. Is the next step going to be: run some sql
 commands against the metastore to manually delete the associated rows?  If
 so,  what are those delete commands?

 thanks




 --
 *Fabian Wollert*
 Business Intelligence



 *POSTANSCHRIFT*
 Zalando SE
 11501 Berlin

 *STANDORT*
 Zalando SE
 Mollstraße 1
 10178 Berlin
 Germany

 Phone: +49 30 20968 1819
 Fax:   +49 30 27594 693
 E-Mail: fabian.woll...@zalando.de
 Web: www.zalando.de
 Jobs: jobs.zalando.de

 Zalando SE, Tamara-Danz-Straße 1, 10243 Berlin
 Handelsregister: Amtsgericht Charlottenburg, HRB 158855 B
 Steuer-Nr. 29/560/00596 * USt-ID-Nr. DE 260543043
 Vorstand: Robert Gentz, David Schneider, Rubin Ritter
 Vorsitzende des Aufsichtsrates: Cristina Stenbeck
 Sitz der Gesellschaft: Berlin



How to clean up a table for which the underlying hdfs file no longer exists

2015-03-21 Thread Stephen Boesch
There is a hive table for which the metadata points to a non-existing hdfs
file.  Simply calling

drop table mytable

results in:

Failed to load metadata for table: db.mytable
Caused by TAbleLoadingException: Failed to load metadata for table: db.mytable
File does not exist:  hdfs://
Caused by FileNotFoundException: File does not exist: hdfs:// ..


So:  the file does not exist in hdfs , and it is not possible to remove the
metadata for it directly. Is the next step going to be: run some sql
commands against the metastore to manually delete the associated rows?  If
so,  what are those delete commands?

thanks


Select distinct on partitioned column requires reading all the files?

2015-02-23 Thread Stephen Boesch
When querying a hive table according to a partitioning column, it would be
logical that a simple

select count(distinct partitioned_column_name) from my_partitioned_table

would complete almost instantaneously.

But we are seeing that both hive and impala are unable to execute this
query properly: they just read the entire table!

What do we need to do to ensure the above command executes rapidly?


Re: Select distinct on partitioned column requires reading all the files?

2015-02-23 Thread Stephen Boesch
Great thanks.  Is this a server-side-only /requires restart parameter?

2015-02-23 22:36 GMT-08:00 Gopal Vijayaraghavan gop...@apache.org:

 Hi,

 Are you sure you have

 hive.optimize.metadataonly=true ?

 I’m not saying it will complete instantaneously (possibly even be very
 slow, due to the lack of a temp-table optimization of that), but it won’t
 read any part of the actual table.

 Cheers,
 Gopal

 From: Stephen Boesch java...@gmail.com
 Reply-To: user@hive.apache.org user@hive.apache.org
 Date: Monday, February 23, 2015 at 10:26 PM
 To: user@hive.apache.org user@hive.apache.org
 Subject: Select distinct on partitioned column requires reading all the
 files?


 When querying a hive table according to a partitioning column, it would be
 logical that a simple

 select count(distinct partitioned_column_name) from my_partitioned_table

 would complete almost instantaneously.

 But we are seeing that both hive and impala are unable to execute this
 query properly: they just read the entire table!

 What do we need to do to ensure the above command executes rapidly?



Re: SELECT without FROM

2016-03-10 Thread Stephen Boesch
>> any database

Just as trivia:
i have not used oracle for quite a while but it traditionally does not.

AFAICT it is also not ansi sql

2016-03-10 17:47 GMT-08:00 Shannon Ladymon :

> It looks like FROM was made optional in Hive 0.13.0 with HIVE-4144
>  (thanks Alan Gates to
> pointing us to the grammar file and Sushanth Sowyman for helping track this
> down). A note has been added to the wiki
> 
> about this.
>
> Dmitry, you said it didn't work in your Hive 0.13 version, but it seems
> like the patch was applied to 0.13.0. You might want to take a look at
> HIVE-4144 and see if that patch was applied to your version.
>
> -Shannon Ladymon
>
> On Wed, Mar 9, 2016 at 2:13 AM, Mich Talebzadeh  > wrote:
>
>> hm. Certainly it worked if I recall correctly on 0.14, 1.2.1 and now on 2
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 9 March 2016 at 10:08, Dmitry Tolpeko  wrote:
>>
>>> Not sure. It does not work in my Hive 0.13 version.
>>>
>>> On Wed, Mar 9, 2016 at 1:06 PM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 I believe it has always been there

 Dr Mich Talebzadeh



 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *



 http://talebzadehmich.wordpress.com



 On 9 March 2016 at 09:58, Dmitry Tolpeko  wrote:

> Mich,
>
> I now that. I just want to trace when it was added to Hive.
>
> Dmitry
>
> On Wed, Mar 9, 2016 at 12:56 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> ASAIK any database does that!
>>
>> 1> set nocount on
>> 2> select @@version
>> 3> select 1 + 1
>> 4> go
>>
>>
>>
>>  
>> ---
>>  Adaptive Server Enterprise/15.7/EBF 21708 SMP SP110
>> /P/x86_64/Enterprise Linux/ase157sp11x/3546/64-bit/FBO/Fri Nov  8 
>> 05:39:38
>> 2013
>>
>>  ---
>>2
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 9 March 2016 at 09:50, Dmitry Tolpeko  wrote:
>>
>>> I noticed that Hive allows you to execute SELECT without FROM clause
>>> (tested in Hive 0.14, Hive 1.2.1):
>>>
>>> SELECT 1+1;
>>>
>>> In which version was it added (is there a Jira)? I see that it is
>>> not mentioned in docs
>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select
>>>
>>> So the question whether it is official and will not be removed in
>>> future.
>>>
>>> Thanks,
>>>
>>> Dmitry
>>>
>>
>>
>

>>>
>>
>


Re: Removing Hive-on-Spark

2020-07-27 Thread Stephen Boesch
Why would it be this way instead of the other way around?

On Mon, 27 Jul 2020 at 12:27, David  wrote:

> Hello Hive Users.
>
> I am interested in gathering some feedback on the adoption of
> Hive-on-Spark.
>
> Does anyone care to volunteer their usage information and would you be
> open to removing it in favor of Hive-on-Tez in subsequent releases of Hive?
>
> If you are on MapReduce still, would you be open to migrating to Tez?
>
> Thanks.
>