Re: ROW_NUMBER() equivalent in Hive

2013-02-21 Thread Stephen Boesch
Hi Ashutosh, I am interested / reviewing your windowing feature. Can you be more specific about which (a) tests and (b) src files constitute your additions (there are lots of files there ;) ) thanks stephen boesch 2013/2/21 Ashutosh Chauhan > Kumar, > > If you are willing

No such file or directory error on simple query

2013-03-02 Thread Stephen Boesch
I am struggling with a "no such file or directory exception " when running a simple query in hive. It is unfortunate that the actual path were not included with the stacktrace: the following is all that is provided. I have a query that fails with the following error when done as hive -e "sele

Re: Hive QL - NOT IN, NOT EXIST

2013-05-05 Thread Stephen Boesch
@Peter Does the query plan demonstrate that the 3Meg row table is being map-joined and the 400M table streamed through? That is what you want: but you might either need to fiddle with hints to get it to happen Details: Read uuids s of feed into in-memory map on all nodes (mapjoin) Stream

Re: Hive QL - NOT IN, NOT EXIST

2013-05-06 Thread Stephen Boesch
Hi Peter, Looks like mapjoin does not work with outer join so streamtable is instead a possible approach. You would stream the larger table through the smaller one: can you see whether the following helps your perf issue? select /*+ streamtable(message) */ f.uuid from message m right outer j

When to use bucketed tables with/instead of partitioned tables

2013-06-16 Thread Stephen Boesch
I am accustomed to using partitioned tables to obtain separate directories for data files in each partition. When looking at the documentation for bucketed tables it seems they are typically used in conjunction with distribute by/sort by and an appropriate partitioning key - and thus provide abili

Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
We have a few dozen files that need to be made available to all mappers/reducers in the cluster while running hive transformation steps . It seems the "add archive" does not make the entries unarchived and thus available directly on the default file path - and that is what we are looking for. T

Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
e thing is given you're using a relative path "hive/parse_qx.py" you > need to know what the "current directory" is when the process runs on the > data nodes. > > > > > On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch wrote: > >> >> We

Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
appname2 string) from eqx ) o insert overwrite table c select o.aappname2, o.qappname2; Get an error: ;) Check the logs: Caused by: java.io.IOException: Cannot run program "classifier_wf.py": java.io.IOException: error=2, No such file or directory 2013/6/20 Stephen Boesch

Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
e > path to that tarball that's kinda up in the air when its added as 'add > file'. Yeah. "local downlooads directory". What's the literal path is > what i'd like to know. :) > > > On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch wrote

Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
e archive isn't unpacked on the remote side. I think add archive > is mostly used for finding java packages since CLASSPATH will reference the > archive (and as such there is no need to expand it.) > > > On Thu, Jun 20, 2013 at 9:00 AM, Stephen Boesch wrote: > >> thx for

Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
, *are you not supposed to use "hivetry" as the > directory? > > May be you should try giving the full path " > /opt/am/ver/1.0/hive/hivetry/classifier_wf.py" and see if it works. > > Regards, > Ramki. > > > On Thu, Jun 20, 2013 at 9:28 AM, Stephen

Re: hive query is very slow,why?

2013-07-18 Thread Stephen Boesch
one mapper. how big is the table? 2013/7/18 ch huang > i wait long time,no result ,why hive is so slow? > > hive> select cookie,url,ip,source,vsid,token,residence,edate from > hb_cookie_history where edate>='1371398400500' and edate<='1371400200500'; > Total MapReduce jobs = 1 > Launching Job

Any Scenarios in which Views impose performance penalties

2013-08-20 Thread Stephen Boesch
Views should theoretically not incur performance penalties: they simply represent queries. Are there situtions that things are "not that simple" - i.e. views may actually result in different exeucution plans than the underlying sql? Additionally, are there views-related bugs that we should be awar

Re: Any Scenarios in which Views impose performance penalties

2013-08-20 Thread Stephen Boesch
e/IMPALA-495 > On Aug 20, 2013 7:16 PM, "Stephen Boesch" wrote: > >> Views should theoretically not incur performance penalties: they simply >> represent queries. Are there situtions that things are "not that simple" - >> i.e. views may actually result

Re: Any Scenarios in which Views impose performance penalties

2013-08-20 Thread Stephen Boesch
. The view is compiled and has no penalty over the >> standard query. >> >> >> On Tuesday, August 20, 2013, Ricky Saltzer wrote: >> > Since this bug was in Impala's query planner, I'm sure Hive is >> unaffected. >> > >> > On Aug 20, 2

BNF for Hive Views

2013-08-25 Thread Stephen Boesch
It appears a bit challenging to find the BNF's for the hive DDL's. After a few google's the following popped up for cdh3 and only for a subset of table creation's. http://archive.cloudera.com/cdh/3/hive/language_manual/data-manipulation-statements.html Is there an updated and more complete DDL

Re: BNF for Hive Views

2013-08-25 Thread Stephen Boesch
hing in the xdocs > is in the wiki now (except for some nifty headings in the CREATE TABLE > section, which ought to be added to the wiki). > > -- Lefty > > > On Sun, Aug 25, 2013 at 4:38 PM, Stephen Boesch wrote: > >> >> It appears a bit challenging to f

Re: BNF for Hive Views

2013-08-25 Thread Stephen Boesch
; > On Mon, Aug 26, 2013 at 12:14 AM, Stephen Boesch wrote: > >> I was already well familiar with the content of the links you provided. >> I have a specific question about the BNF for views (and potentially other >> ddl/dml) that does not appear to be addressed . Thanks. >&

Re: BNF for Hive Views

2013-08-25 Thread Stephen Boesch
The antlr file (Hive.g ) is providing the info I need for this specific case, but if BNF exists a pointer would still be helpful . Thx 2013/8/25 Stephen Boesch > yes i had read and re-read it. I do have a specific reason for wishing > to view the bnf. thanks. > > > 2013/8/25

Pseudo column for the entire Line/Row ?

2013-08-30 Thread Stephen Boesch
I am writing a UDF that will perform validation on the input row and shall require access to every column in the row (or alternatively to simply the unparsed/pre-processed line). Is there any way to achieve this? Or will it be simply necessary to declare an overloaded evaluate() method with a sig

Re: DISCUSS: Hive language manual to be source control managed

2013-09-01 Thread Stephen Boesch
Will this allow BNF's for the DDL / DML to be provided and made up to date more readily ? 2013/9/1 Edward Capriolo > Over the past few weeks I have taken several looks over documents in our > wiki. > The page that strikes me as alarmingly poor is the: > https://cwiki.apache.org/Hive/languagem

Options for Loading Side Data / small files in UDF

2013-09-13 Thread Stephen Boesch
We have a UDF that is configured via a small properties file. What are the options for distributing the file for the task nodes? Also we want to be able to update the file frequently. We are not running on AWS so S3 is not an option - and we do not have access to NFS/other shared disk from the M

Re: Options for Loading Side Data / small files in UDF

2013-09-13 Thread Stephen Boesch
> http://stackoverflow.com/questions/15429040/add-multiple-files-to-distributed-cache-in-hive > > Regards, > > Jagat > > > On Sat, Sep 14, 2013 at 9:57 AM, Stephen Boesch wrote: > >> >> We have a UDF that is configured via a small properties file. What are &g

Re: Options for Loading Side Data / small files in UDF

2013-09-13 Thread Stephen Boesch
/edwardcapriolo/hive-geoip/ > > > > > On Sat, Sep 14, 2013 at 10:12 AM, Stephen Boesch wrote: > >> I should have mentioned: we can not use the "add file" here because this >> is running within a framework. we need to use Java api's >> >> >&

Loading data into partition taking seven times total of (map+reduce) on highly skewed data

2013-09-20 Thread Stephen Boesch
We have a small (3GB /280M rows) table with 435 partitions that is highly skewed: one partition has nearly 200M, two others have nearly 40M apiece, then the remaining 432 have all together less than 1% of total table size. So .. the skew is something to be addressed. However - even give that - w

Re: Loading data into partition taking seven times total of (map+reduce) on highly skewed data

2013-09-20 Thread Stephen Boesch
Another detail: ~400 mappers 64 reducers 2013/9/20 Stephen Boesch > > We have a small (3GB /280M rows) table with 435 partitions that is highly > skewed: one partition has nearly 200M, two others have nearly 40M apiece, > then the remaining 432 have all together less than 1% of

Re: Predicate pushdown optimisation not working for ORC

2014-04-02 Thread Stephen Boesch
HI Abhay, What is the DDL for your "test" table? 2014-04-02 22:36 GMT-07:00 Abhay Bansal : > I am new to Hive, apologise for asking such a basic question. > > Following exercise was done with hive .12 and hadoop 0.20.203 > > I created a ORC file form java, and pushed it into a table with the s

Re: MongoDB storage handler for HIVE

2011-11-17 Thread Stephen Boesch
Nice idea! I have worked a bit with Mongo and am leaning towards hive . This could be a nice combo. will check it out (pun intended) 2011/11/17 YC Huang > I just have a quick and dirty implementation of a MongoDB storage handler > for HIVE, the project is hosted on GitHub: > https://github.com/

Re: Mysql metastore configuration error.

2011-11-21 Thread Stephen Boesch
Was that code above *verbatim? * because there is a typo Hive> Load *s*ata local inpath ‘path/to/abcd.txt’ into table abcd; (load sata not load data) 2011/11/21 Aditya Singh30 > Hi Everybody, > > I am using Apache’s Hadoop-0.20.2 and > Apache’s Hive-0.7.0.

Re: Important Question

2012-01-25 Thread Stephen Boesch
Dalia your requirements appear to be transaction oriented and thus OLTP systems - i.e. regular relational databases - are more likely to be suitable than a hive (/hadoop) based solution. Hive is more for business intelligence and certainly includes latencies - which by saying 'realtime' - would

Error while reading from task log url

2012-03-29 Thread Stephen Boesch
Hi I am able to run certain hive commands e.g. create table and select.. but not others ..Also my hadoop pseudo disributed cluster is working fine - i can run the examples. Examples of commands that fail: insert overwrite table demographics select * from demographics_local; Control-C (kill

Re: Error while reading from task log url

2012-03-29 Thread Stephen Boesch
When I go to that url here is the result: HTTP ERROR 400 Problem accessing /tasklog. Reason: Argument attemptid is required -- *Powered by Jetty:// * 2012/3/29 Stephen Boesch > Hi > I am able to run certain hive commands e.g. create table and

How to use create .. as select

2012-03-29 Thread Stephen Boesch
I see hive-31 supposedly supports this, but when mimicking the syntax in the jira i get errors https://issues.apache.org/jira/browse/HIVE-31 hive> create table dem select demographics_local.* from demographics_local; FAILED: Parse Error: line 1:19 cannot recognize input near 'select' 'demograp

Re: Error while reading from task log url

2012-03-29 Thread Stephen Boesch
change > taskid to attemptid in the url then you'll get the error logs you need > to debug the root cause. > > I'll try and submit a patch at some point. > > Phil. > > On 29 March 2012 17:36, Stephen Boesch wrote: > > When I go to that url here is the result: >

Re: How to use create .. as select

2012-03-30 Thread Stephen Boesch
t; https://cwiki.apache.org/confluence/display/Hive/LanguageManual > > Or if you are are feeling brave look in the clientpositive directory > of the source code. > > Edward > > On Thu, Mar 29, 2012 at 1:15 PM, Stephen Boesch wrote: > > I see hive-31 supposedly supports this,

Re: How to use create .. as select

2012-03-30 Thread Stephen Boesch
; Regards > Bejoy KS > ---------- > *From:* Stephen Boesch > *To:* user@hive.apache.org > *Sent:* Thursday, March 29, 2012 10:45 PM > *Subject:* How to use create .. as select > > I see hive-31 supposedly supports this, but when mimicking the syntax in > the

Custom hive-site.xml is ignored, how to find out why

2012-11-24 Thread Stephen Boesch
It seems the customized hive-site.xml is not being read. It lives under $HIVE_HOME/conf ( which happens to be /shared/hive/conf). I have tried everything there is to try: set HIVE_CONF_DIR=/shared/hive/conf , added --config /shared/hive/conf and added debugging to the hive shell script (bash -x

Re: Custom hive-site.xml is ignored, how to find out why

2012-11-24 Thread Stephen Boesch
It appears that I were missing the *hive.metastore.uris* parameter. That one was not mentioned in the (several) blogs / tutorials that I had seen. 2012/11/24 Stephen Boesch > > It seems the customized hive-site.xml is not being read. It lives under > $HIVE_HOME/conf ( which happ

Re: Custom hive-site.xml is ignored, how to find out why

2012-11-26 Thread Stephen Boesch
ord >password > > > >datanucleus.autoCreateSchema >false > > > >datanucleus.fixedDatastore >true > > > > Thanks. > Shreepadma > > > On Sat, Nov 24, 2012 at 8:41 PM, Stephen Boesch wrote: > >> It appears that I were missing the *hive.met

hive-site.xml not found on classpath

2012-11-29 Thread Stephen Boesch
I am seeing the following message in the logs (which are in the wrong place under /tmp..) hive-site.xml not found on classpath My hive-site.xml is under the standard location $HIVE_HOME/conf so this should not happen. Now some posts have talked that the HADOOP_CLASSPATH was mangled. Mine is no

Re: hive-site.xml not found on classpath

2012-11-29 Thread Stephen Boesch
i am running under user steve. the latest log (where this shows up ) is /tmp/steve/hive.log 2012/11/29 Viral Bajaria > You are seeing this error when you run the hive cli or in the tasktracker > logs when you run a query ? > > On Thu, Nov 29, 2012 at 12:42 AM, Stephen Boesch wrot

Re: hive-site.xml not found on classpath

2012-11-29 Thread Stephen Boesch
Yes. 2012/11/29 Shreepadma Venugopalan > Are you seeing this message when your bring up the standalone hive cli by > running 'hive'? > > > On Thu, Nov 29, 2012 at 12:56 AM, Stephen Boesch wrote: > >> i am running under user steve. the latest log (where

Re: hive-site.xml not found on classpath

2012-11-29 Thread Stephen Boesch
setting HIVE_HOME and HIVE_CONF_DIR? > > > On Thu, Nov 29, 2012 at 2:46 PM, Stephen Boesch wrote: > >> Yes. >> >> >> 2012/11/29 Shreepadma Venugopalan >> >>> Are you seeing this message when your bring up the standalone hive cli >>> by running

Re: hive-site.xml not found on classpath

2012-11-29 Thread Stephen Boesch
gt; > Did you mean that you set a different log dir but it didn't work? > > the log dir should be set in conf/hive-log4j.properties, > conf/hive-exec-log4j.properties > and you can try to reset HIVE_CONF_DIR in conf/hive-env.sh with ‘export" > command. > > - Bing >

Re: hive-site.xml not found on classpath

2012-11-30 Thread Stephen Boesch
T" >exit 7 > else >$TORUN "$@" > fi > > The version I used is 0.9.0 > > > > 2012/11/30 Stephen Boesch > >> Yes i do mean the log is in the wrong location, since it was set to a >> persistent path in the $HIVE_CONF_DIR/lhive-log4j.p

Re: hive-site.xml not found on classpath

2012-12-09 Thread Stephen Boesch
; Like so: > > EXPORT HIVE_OPTS= -hiveconf hive.log.dir=$ HIVE_HOME\logs" > > Then run “hive” > > ** ** > > Or: > > Run “hive --hiveconf hive.log.dir=$ HIVE_HOME\logs” > > ** ** > > Thanks, > > Lauren > > ** ** &g

Re: hive-site.xml not found on classpath

2012-12-09 Thread Stephen Boesch
allowing the metadata to be stored in mysql 2012/12/9 Stephen Boesch > The first element of the classpath is the right one already.. but I STILL > get the hive-site.xml is not found in classpath. Only hive gives me > issues. hdfs, mapred, hbase are all running fine. > &

More recent Hive-Hbase Integration info/docs

2014-07-10 Thread Stephen Boesch
The url for the hbase-hive integration: https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration has old versions: Hbase 0.92.0 and hadoop 0.20.x Are there any significant changes to these docs that anyone might (a) have pointers to or (b) be able/willing to mention here as important

Select distinct on partitioned column requires reading all the files?

2015-02-23 Thread Stephen Boesch
When querying a hive table according to a partitioning column, it would be logical that a simple select count(distinct partitioned_column_name) from my_partitioned_table would complete almost instantaneously. But we are seeing that both hive and impala are unable to execute this query properly:

Re: Select distinct on partitioned column requires reading all the files?

2015-02-23 Thread Stephen Boesch
ow, due to the lack of a temp-table optimization of that), but it won’t > read any part of the actual table. > > Cheers, > Gopal > > From: Stephen Boesch > Reply-To: "user@hive.apache.org" > Date: Monday, February 23, 2015 at 10:26 PM > To: "user@hive.apac

How to clean up a table for which the underlying hdfs file no longer exists

2015-03-21 Thread Stephen Boesch
There is a hive table for which the metadata points to a non-existing hdfs file. Simply calling drop table results in: Failed to load metadata for table: db.mytable Caused by TAbleLoadingException: Failed to load metadata for table: db.mytable File does not exist: hdfs:// Caused by Fi

Re: How to clean up a table for which the underlying hdfs file no longer exists

2015-03-22 Thread Stephen Boesch
; > 2015-03-22 3:15 GMT+01:00 Stephen Boesch : > >> >> There is a hive table for which the metadata points to a non-existing >> hdfs file. Simply calling >> >> drop table >> >> results in: >> >> Failed to load metadata for tabl

Re: SELECT without FROM

2016-03-10 Thread Stephen Boesch
>> any database Just as trivia: i have not used oracle for quite a while but it traditionally does not. AFAICT it is also not ansi sql 2016-03-10 17:47 GMT-08:00 Shannon Ladymon : > It looks like FROM was made optional in Hive 0.13.0 with HIVE-4144 >

Re: Removing Hive-on-Spark

2020-07-27 Thread Stephen Boesch
Why would it be this way instead of the other way around? On Mon, 27 Jul 2020 at 12:27, David wrote: > Hello Hive Users. > > I am interested in gathering some feedback on the adoption of > Hive-on-Spark. > > Does anyone care to volunteer their usage information and would you be > open to removin