Re: MongoDB storage handler for HIVE

2011-11-17 Thread Stephen Boesch
Nice idea! I have worked a bit with Mongo and am leaning towards hive . This could be a nice combo. will check it out (pun intended) 2011/11/17 YC Huang ychuang...@gmail.com I just have a quick and dirty implementation of a MongoDB storage handler for HIVE, the project is hosted on GitHub:

Re: Mysql metastore configuration error.

2011-11-21 Thread Stephen Boesch
Was that code above *verbatim? * because there is a typo Hive Load *s*ata local inpath ‘path/to/abcd.txt’ into table abcd; (load sata not load data) 2011/11/21 Aditya Singh30 aditya_sing...@infosys.com Hi Everybody, I am using Apache’s Hadoop-0.20.2 and

Re: Important Question

2012-01-25 Thread Stephen Boesch
Dalia your requirements appear to be transaction oriented and thus OLTP systems - i.e. regular relational databases - are more likely to be suitable than a hive (/hadoop) based solution. Hive is more for business intelligence and certainly includes latencies - which by saying 'realtime' - would

Error while reading from task log url

2012-03-29 Thread Stephen Boesch
Hi I am able to run certain hive commands e.g. create table and select.. but not others ..Also my hadoop pseudo disributed cluster is working fine - i can run the examples. Examples of commands that fail: insert overwrite table demographics select * from demographics_local; Control-C

Re: Error while reading from task log url

2012-03-29 Thread Stephen Boesch
When I go to that url here is the result: HTTP ERROR 400 Problem accessing /tasklog. Reason: Argument attemptid is required -- *Powered by Jetty:// * 2012/3/29 Stephen Boesch java...@gmail.com Hi I am able to run certain hive commands e.g. create table

How to use create .. as select

2012-03-29 Thread Stephen Boesch
I see hive-31 supposedly supports this, but when mimicking the syntax in the jira i get errors https://issues.apache.org/jira/browse/HIVE-31 hive create table dem select demographics_local.* from demographics_local; FAILED: Parse Error: line 1:19 cannot recognize input near 'select'

Re: How to use create .. as select

2012-03-30 Thread Stephen Boesch
KS -- *From:* Stephen Boesch java...@gmail.com *To:* user@hive.apache.org *Sent:* Thursday, March 29, 2012 10:45 PM *Subject:* How to use create .. as select I see hive-31 supposedly supports this, but when mimicking the syntax in the jira i get errors https

Custom hive-site.xml is ignored, how to find out why

2012-11-24 Thread Stephen Boesch
It seems the customized hive-site.xml is not being read. It lives under $HIVE_HOME/conf ( which happens to be /shared/hive/conf). I have tried everything there is to try: set HIVE_CONF_DIR=/shared/hive/conf , added --config /shared/hive/conf and added debugging to the hive shell script (bash

Re: Custom hive-site.xml is ignored, how to find out why

2012-11-24 Thread Stephen Boesch
It appears that I were missing the *hive.metastore.uris* parameter. That one was not mentioned in the (several) blogs / tutorials that I had seen. 2012/11/24 Stephen Boesch java...@gmail.com It seems the customized hive-site.xml is not being read. It lives under $HIVE_HOME/conf ( which

Re: Custom hive-site.xml is ignored, how to find out why

2012-11-26 Thread Stephen Boesch
, Stephen Boesch java...@gmail.com wrote: It appears that I were missing the *hive.metastore.uris* parameter. That one was not mentioned in the (several) blogs / tutorials that I had seen. 2012/11/24 Stephen Boesch java...@gmail.com It seems the customized hive-site.xml is not being read

hive-site.xml not found on classpath

2012-11-29 Thread Stephen Boesch
I am seeing the following message in the logs (which are in the wrong place under /tmp..) hive-site.xml not found on classpath My hive-site.xml is under the standard location $HIVE_HOME/conf so this should not happen. Now some posts have talked that the HADOOP_CLASSPATH was mangled. Mine is

Re: hive-site.xml not found on classpath

2012-11-29 Thread Stephen Boesch
i am running under user steve. the latest log (where this shows up ) is /tmp/steve/hive.log 2012/11/29 Viral Bajaria viral.baja...@gmail.com You are seeing this error when you run the hive cli or in the tasktracker logs when you run a query ? On Thu, Nov 29, 2012 at 12:42 AM, Stephen

Re: hive-site.xml not found on classpath

2012-11-29 Thread Stephen Boesch
Yes. 2012/11/29 Shreepadma Venugopalan shreepa...@cloudera.com Are you seeing this message when your bring up the standalone hive cli by running 'hive'? On Thu, Nov 29, 2012 at 12:56 AM, Stephen Boesch java...@gmail.comwrote: i am running under user steve. the latest log (where

Re: hive-site.xml not found on classpath

2012-11-29 Thread Stephen Boesch
you tried setting HIVE_HOME and HIVE_CONF_DIR? On Thu, Nov 29, 2012 at 2:46 PM, Stephen Boesch java...@gmail.com wrote: Yes. 2012/11/29 Shreepadma Venugopalan shreepa...@cloudera.com Are you seeing this message when your bring up the standalone hive cli by running 'hive'? On Thu, Nov 29

Re: hive-site.xml not found on classpath

2012-11-29 Thread Stephen Boesch
dir but it didn't work? the log dir should be set in conf/hive-log4j.properties, conf/hive-exec-log4j.properties and you can try to reset HIVE_CONF_DIR in conf/hive-env.sh with ‘export command. - Bing 2012/11/30 Stephen Boesch java...@gmail.com thought i mentioned in the posts those were

Re: hive-site.xml not found on classpath

2012-11-30 Thread Stephen Boesch
2012/11/30 Stephen Boesch java...@gmail.com Yes i do mean the log is in the wrong location, since it was set to a persistent path in the $HIVE_CONF_DIR/lhive-log4j.properties. None of the files in that directory appear to be picked up properly: neither the hive-site.xml nor

Re: hive-site.xml not found on classpath

2012-12-09 Thread Stephen Boesch
Then run “hive” ** ** Or: Run “hive --hiveconf hive.log.dir=$ HIVE_HOME\logs” ** ** Thanks, Lauren ** ** ** ** *From:* Stephen Boesch [mailto:java...@gmail.com] *Sent:* Friday, November 30, 2012 12:16 AM *To:* user@hive.apache.org *Subject:* Re: hive

Re: hive-site.xml not found on classpath

2012-12-09 Thread Stephen Boesch
to be stored in mysql 2012/12/9 Stephen Boesch java...@gmail.com The first element of the classpath is the right one already.. but I STILL get the hive-site.xml is not found in classpath. Only hive gives me issues. hdfs, mapred, hbase are all running fine. HADOOP_CLASSPATH=:*/shared/hive/conf

Re: ROW_NUMBER() equivalent in Hive

2013-02-21 Thread Stephen Boesch
Hi Ashutosh, I am interested / reviewing your windowing feature. Can you be more specific about which (a) tests and (b) src files constitute your additions (there are lots of files there ;) ) thanks stephen boesch 2013/2/21 Ashutosh Chauhan hashut...@apache.org Kumar, If you

No such file or directory error on simple query

2013-03-02 Thread Stephen Boesch
I am struggling with a no such file or directory exception when running a simple query in hive. It is unfortunate that the actual path were not included with the stacktrace: the following is all that is provided. I have a query that fails with the following error when done as hive -e select

Re: Hive QL - NOT IN, NOT EXIST

2013-05-05 Thread Stephen Boesch
@Peter Does the query plan demonstrate that the 3Meg row table is being map-joined and the 400M table streamed through? That is what you want: but you might either need to fiddle with hints to get it to happen Details: Read uuids s of feed into in-memory map on all nodes (mapjoin)

Re: Hive QL - NOT IN, NOT EXIST

2013-05-06 Thread Stephen Boesch
Hi Peter, Looks like mapjoin does not work with outer join so streamtable is instead a possible approach. You would stream the larger table through the smaller one: can you see whether the following helps your perf issue? select /*+ streamtable(message) */ f.uuid from message m right outer

Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
We have a few dozen files that need to be made available to all mappers/reducers in the cluster while running hive transformation steps . It seems the add archive does not make the entries unarchived and thus available directly on the default file path - and that is what we are looking for. To

Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
you need to know what the current directory is when the process runs on the data nodes. On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch java...@gmail.com wrote: We have a few dozen files that need to be made available to all mappers/reducers in the cluster while running hive

Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
in the air when its added as 'add file'. Yeah. local downlooads directory. What's the literal path is what i'd like to know. :) On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch java...@gmail.com wrote: @Stephen: given the 'relative' path for hive is from a local downloads directory

Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
for finding java packages since CLASSPATH will reference the archive (and as such there is no need to expand it.) On Thu, Jun 20, 2013 at 9:00 AM, Stephen Boesch java...@gmail.com wrote: thx for the tip on add file where file is directory. I will try that. 2013/6/20 Stephen Sprague sprag

Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
, *are you not supposed to use hivetry as the directory? May be you should try giving the full path /opt/am/ver/1.0/hive/hivetry/classifier_wf.py and see if it works. Regards, Ramki. On Thu, Jun 20, 2013 at 9:28 AM, Stephen Boesch java...@gmail.com wrote: Stephen: would you be willing

Re: hive query is very slow,why?

2013-07-18 Thread Stephen Boesch
one mapper. how big is the table? 2013/7/18 ch huang justlo...@gmail.com i wait long time,no result ,why hive is so slow? hive select cookie,url,ip,source,vsid,token,residence,edate from hb_cookie_history where edate='1371398400500' and edate='1371400200500'; Total MapReduce jobs = 1

Any Scenarios in which Views impose performance penalties

2013-08-20 Thread Stephen Boesch
Views should theoretically not incur performance penalties: they simply represent queries. Are there situtions that things are not that simple - i.e. views may actually result in different exeucution plans than the underlying sql? Additionally, are there views-related bugs that we should be aware

Re: Any Scenarios in which Views impose performance penalties

2013-08-20 Thread Stephen Boesch
/browse/IMPALA-495 On Aug 20, 2013 7:16 PM, Stephen Boesch java...@gmail.com wrote: Views should theoretically not incur performance penalties: they simply represent queries. Are there situtions that things are not that simple - i.e. views may actually result in different exeucution plans than

Re: Any Scenarios in which Views impose performance penalties

2013-08-20 Thread Stephen Boesch
. The view is compiled and has no penalty over the standard query. On Tuesday, August 20, 2013, Ricky Saltzer ri...@cloudera.com wrote: Since this bug was in Impala's query planner, I'm sure Hive is unaffected. On Aug 20, 2013 10:15 PM, Stephen Boesch java...@gmail.com wrote: Thanks

BNF for Hive Views

2013-08-25 Thread Stephen Boesch
It appears a bit challenging to find the BNF's for the hive DDL's. After a few google's the following popped up for cdh3 and only for a subset of table creation's. http://archive.cloudera.com/cdh/3/hive/language_manual/data-manipulation-statements.html Is there an updated and more complete DDL

Re: BNF for Hive Views

2013-08-25 Thread Stephen Boesch
ought to be added to the wiki). -- Lefty On Sun, Aug 25, 2013 at 4:38 PM, Stephen Boesch java...@gmail.com wrote: It appears a bit challenging to find the BNF's for the hive DDL's. After a few google's the following popped up for cdh3 and only for a subset of table creation's. http

Re: BNF for Hive Views

2013-08-25 Thread Stephen Boesch
at 12:14 AM, Stephen Boesch java...@gmail.comwrote: I was already well familiar with the content of the links you provided. I have a specific question about the BNF for views (and potentially other ddl/dml) that does not appear to be addressed . Thanks. 2013/8/25 Lefty Leverenz leftylever

Re: BNF for Hive Views

2013-08-25 Thread Stephen Boesch
The antlr file (Hive.g ) is providing the info I need for this specific case, but if BNF exists a pointer would still be helpful . Thx 2013/8/25 Stephen Boesch java...@gmail.com yes i had read and re-read it. I do have a specific reason for wishing to view the bnf. thanks. 2013/8/25

Pseudo column for the entire Line/Row ?

2013-08-30 Thread Stephen Boesch
I am writing a UDF that will perform validation on the input row and shall require access to every column in the row (or alternatively to simply the unparsed/pre-processed line). Is there any way to achieve this? Or will it be simply necessary to declare an overloaded evaluate() method with a

Re: DISCUSS: Hive language manual to be source control managed

2013-09-01 Thread Stephen Boesch
Will this allow BNF's for the DDL / DML to be provided and made up to date more readily ? 2013/9/1 Edward Capriolo edlinuxg...@gmail.com Over the past few weeks I have taken several looks over documents in our wiki. The page that strikes me as alarmingly poor is the:

Options for Loading Side Data / small files in UDF

2013-09-13 Thread Stephen Boesch
We have a UDF that is configured via a small properties file. What are the options for distributing the file for the task nodes? Also we want to be able to update the file frequently. We are not running on AWS so S3 is not an option - and we do not have access to NFS/other shared disk from the

Re: Options for Loading Side Data / small files in UDF

2013-09-13 Thread Stephen Boesch
/questions/15429040/add-multiple-files-to-distributed-cache-in-hive Regards, Jagat On Sat, Sep 14, 2013 at 9:57 AM, Stephen Boesch java...@gmail.com wrote: We have a UDF that is configured via a small properties file. What are the options for distributing the file for the task nodes? Also

Re: Options for Loading Side Data / small files in UDF

2013-09-13 Thread Stephen Boesch
/edwardcapriolo/hive-geoip/ On Sat, Sep 14, 2013 at 10:12 AM, Stephen Boesch java...@gmail.comwrote: I should have mentioned: we can not use the add file here because this is running within a framework. we need to use Java api's 2013/9/13 Jagat Singh jagatsi...@gmail.com Hi You can use

Loading data into partition taking seven times total of (map+reduce) on highly skewed data

2013-09-20 Thread Stephen Boesch
We have a small (3GB /280M rows) table with 435 partitions that is highly skewed: one partition has nearly 200M, two others have nearly 40M apiece, then the remaining 432 have all together less than 1% of total table size. So .. the skew is something to be addressed. However - even give that -

Re: Loading data into partition taking seven times total of (map+reduce) on highly skewed data

2013-09-20 Thread Stephen Boesch
Another detail: ~400 mappers 64 reducers 2013/9/20 Stephen Boesch java...@gmail.com We have a small (3GB /280M rows) table with 435 partitions that is highly skewed: one partition has nearly 200M, two others have nearly 40M apiece, then the remaining 432 have all together less than 1

Re: Predicate pushdown optimisation not working for ORC

2014-04-02 Thread Stephen Boesch
HI Abhay, What is the DDL for your test table? 2014-04-02 22:36 GMT-07:00 Abhay Bansal abhaybansal.1...@gmail.com: I am new to Hive, apologise for asking such a basic question. Following exercise was done with hive .12 and hadoop 0.20.203 I created a ORC file form java, and pushed it

More recent Hive-Hbase Integration info/docs

2014-07-10 Thread Stephen Boesch
The url for the hbase-hive integration: https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration has old versions: Hbase 0.92.0 and hadoop 0.20.x Are there any significant changes to these docs that anyone might (a) have pointers to or (b) be able/willing to mention here as important

Re: How to clean up a table for which the underlying hdfs file no longer exists

2015-03-22 Thread Stephen Boesch
2015-03-22 3:15 GMT+01:00 Stephen Boesch java...@gmail.com: There is a hive table for which the metadata points to a non-existing hdfs file. Simply calling drop table mytable results in: Failed to load metadata for table: db.mytable Caused by TAbleLoadingException: Failed to load

How to clean up a table for which the underlying hdfs file no longer exists

2015-03-21 Thread Stephen Boesch
There is a hive table for which the metadata points to a non-existing hdfs file. Simply calling drop table mytable results in: Failed to load metadata for table: db.mytable Caused by TAbleLoadingException: Failed to load metadata for table: db.mytable File does not exist: hdfs://

Select distinct on partitioned column requires reading all the files?

2015-02-23 Thread Stephen Boesch
When querying a hive table according to a partitioning column, it would be logical that a simple select count(distinct partitioned_column_name) from my_partitioned_table would complete almost instantaneously. But we are seeing that both hive and impala are unable to execute this query properly:

Re: Select distinct on partitioned column requires reading all the files?

2015-02-23 Thread Stephen Boesch
to the lack of a temp-table optimization of that), but it won’t read any part of the actual table. Cheers, Gopal From: Stephen Boesch java...@gmail.com Reply-To: user@hive.apache.org user@hive.apache.org Date: Monday, February 23, 2015 at 10:26 PM To: user@hive.apache.org user@hive.apache.org

Re: SELECT without FROM

2016-03-10 Thread Stephen Boesch
>> any database Just as trivia: i have not used oracle for quite a while but it traditionally does not. AFAICT it is also not ansi sql 2016-03-10 17:47 GMT-08:00 Shannon Ladymon : > It looks like FROM was made optional in Hive 0.13.0 with HIVE-4144 >

Re: Removing Hive-on-Spark

2020-07-27 Thread Stephen Boesch
Why would it be this way instead of the other way around? On Mon, 27 Jul 2020 at 12:27, David wrote: > Hello Hive Users. > > I am interested in gathering some feedback on the adoption of > Hive-on-Spark. > > Does anyone care to volunteer their usage information and would you be > open to