Re: Serde moved? version 2.3.0

2017-10-25 Thread Stephen Sprague
inal >> class by subclassing AbstractSerDe, unless the API has changed such that >> such a mapping cannot be done. >> >> Regards, >> Matt >> >> >> >> On Oct 25, 2017, at 7:31 PM, Owen O'Malley <owen.omal...@gmail.com> >> wrote: >>

Fwd: Serde moved? version 2.3.0

2017-10-25 Thread Stephen Sprague
e AbstractSerDe instead. > > .. Owen > > On Oct 25, 2017, at 2:18 PM, Stephen Sprague <sprag...@gmail.com> wrote: > > hey guys, > > could be a dumb question but not being a java type of guy i'm not quite > sure about it. I'm upgrading from 2.1.0 to 2.3.0 and encounteri

Serde moved? version 2.3.0

2017-10-25 Thread Stephen Sprague
hey guys, could be a dumb question but not being a java type of guy i'm not quite sure about it. I'm upgrading from 2.1.0 to 2.3.0 and encountering this error: class not found: org/apache/hadoop/hive/serde2/SerDe so in hive 2.1.0 i see it in this jar: * hive-serde-2.1.0.jar

Re: hive on spark - why is it so hard?

2017-10-01 Thread Stephen Sprague
matter of comparing the performance with Tez. Cheers, Stephen. On Wed, Sep 27, 2017 at 9:37 PM, Stephen Sprague <sprag...@gmail.com> wrote: > ok.. getting further. seems now i have to deploy hive to all nodes in the > cluster - don't think i had to do that before but not a big deal to do it &g

Re: hive on spark - why is it so hard?

2017-09-27 Thread Stephen Sprague
lity issue. i know. i know. no surprise here. so i guess i just got to the point where everybody else is... build spark w/o hive. lemme see what happens next. On Wed, Sep 27, 2017 at 7:41 PM, Stephen Sprague <sprag...@gmail.com> wrote: > thanks. I haven't had a chance to dig into thi

Re: hive on spark - why is it so hard?

2017-09-27 Thread Stephen Sprague
ggest taking a look at the HoS Remote Driver logs. The driver > gets launched in a YARN container (assuming you are running Spark in > yarn-client mode), so you just have to find the logs for that container. > > --Sahil > > On Tue, Sep 26, 2017 at 9:17 PM, Stephen Sprague <sp

Re: hive on spark - why is it so hard?

2017-09-26 Thread Stephen Sprague
MemoryAndCores(SetSparkReducerParallelism.java:236) [hive-exec-2.3.0.jar:2.3.0] i'll dig some more tomorrow. On Tue, Sep 26, 2017 at 8:23 PM, Stephen Sprague <sprag...@gmail.com> wrote: > oh. i missed Gopal's reply. oy... that sounds foreboding. I'll keep you > posted on my pr

Re: hive on spark - why is it so hard?

2017-09-26 Thread Stephen Sprague
oh. i missed Gopal's reply. oy... that sounds foreboding. I'll keep you posted on my progress. On Tue, Sep 26, 2017 at 4:40 PM, Gopal Vijayaraghavan wrote: > Hi, > > > org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a > spark session:

Re: hive on spark - why is it so hard?

2017-09-26 Thread Stephen Sprague
OME set locally? Do you > have older versions of Spark installed locally? > > --Sahil > > On Tue, Sep 26, 2017 at 3:33 PM, Stephen Sprague <sprag...@gmail.com> > wrote: > >> thanks Sahil. here it is. >> >> Exception in thread "ma

Re: hive on spark - why is it so hard?

2017-09-26 Thread Stephen Sprague
t Spark 2.0.0. Hive may work with more recent versions > of Spark, but we only test with Spark 2.0.0. > > --Sahil > > On Tue, Sep 26, 2017 at 2:35 PM, Stephen Sprague <sprag...@gmail.com> > wrote: > >> * i've installed hive 2.3 and spark 2.2 >> >> * i've

hive on spark - why is it so hard?

2017-09-26 Thread Stephen Sprague
* i've installed hive 2.3 and spark 2.2 * i've read this doc plenty of times -> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started * i run this query: hive --hiveconf hive.root.logger=DEBUG,console -e 'set hive.execution.engine=spark; select date_key, count(*)

group by + two nulls in a row = bug?

2017-06-27 Thread Stephen Sprague
i'm running hive version 2.1.0 and found this interesting. i've broken it down into a trivial test case below. i run this: select a.date_key, a.property_id, cast(NULL as bigint) as malone_id, cast(NULL as bigint) as zpid,

any hive release imminent?

2017-06-19 Thread Stephen Sprague
Hey guys, Is there any word out on the street about a timeframe for the next 2.x hive release? Looks like Dec 2016 was the last one. The natives are getting restless i think. :) thanks, Stephen.

Re: How to setup the max memory for my big Hive SQL which is on MapReduce of Yarn

2017-06-06 Thread Stephen Sprague
have you researched the yarn schedulers? namely the capacity and fair schedulers? those are the places where resource limits can be easily defined. On Mon, Jun 5, 2017 at 9:25 PM, Chang.Wu <583424...@qq.com> wrote: > My Hive engine is MapReduce and Yarn. What my urgent need is to limit the >

Re: drop table - external - aws

2017-05-17 Thread Stephen Sprague
at 6:57 PM, Vihang Karajgaonkar <vih...@cloudera.com> > wrote: > >> This is interesting and possibly a bug. Did you try changing them to >> managed tables and then dropping or truncating them? How do we reproduce >> this on our setup? >> >> On Tue, May 16, 20

Re: drop table - external - aws

2017-05-17 Thread Stephen Sprague
bles and then dropping or truncating them? How do we reproduce > this on our setup? > > On Tue, May 16, 2017 at 6:38 PM, Stephen Sprague <sprag...@gmail.com> > wrote: > >> fwiw. i ended up re-creating the ec2 cluster with that same host name >> just so i

Re: drop table - external - aws

2017-05-16 Thread Stephen Sprague
at 6:38 AM, Stephen Sprague <sprag...@gmail.com> wrote: > hey guys, > here's something bizarre. i created about 200 external tables with a > location something like this 'hdfs:///path'. this was three > months ago and now i'm revisiting and want to drop these tables. > > ha!

Re: hive on spark - version question

2017-03-17 Thread Stephen Sprague
who work on this kinds of things typically make > more money that the average developers. If you make more $$s it makes sense > learning this stuff is supposed to be harder. > > Conclusion, don't try it. Or try using Tez/Hive instead of Spark/Hive if > you are querying large file

Re: hive on spark - version question

2017-03-17 Thread Stephen Sprague
ive group and viceversa (I can almost guarantee it based on previous >> experiences) >> >> But in hindsight, people who work on this kinds of things typically make >> more money that the average developers. If you make more $$s it makes sense >> learning this stuff is suppo

Re: hive on spark - version question

2017-03-17 Thread Stephen Sprague
:( gettin' no love on this one. any SME's know if Spark 2.1.0 will work with Hive 2.1.0 ? That JavaSparkListener class looks like a deal breaker to me, alas. thanks in advance. Cheers, Stephen. On Mon, Mar 13, 2017 at 10:32 PM, Stephen Sprague <sprag...@gmail.com> wrote: >

hive on spark - version question

2017-03-14 Thread Stephen Sprague
hi guys, wondering where we stand with Hive On Spark these days? i'm trying to run Spark 2.1.0 with Hive 2.1.0 (purely coincidental versions) and running up against this class not found: java.lang.NoClassDefFoundError: org/apache/spark/JavaSparkListener searching the Cyber i find this: 1.

random KILL's in YARN

2017-01-18 Thread Stephen Sprague
hey guys, I have a question on why Hiveserver2 would issue a "killjob" signal. We run Yarn on Hadoop 5.6 with the HiveServer2 process. It uses the fair-scheduler. Pre-emption is turned off. At least twice a day we have jobs that are randomly killed. they can be big jobs, they can be small ones.

Re: tez + union stmt

2016-12-25 Thread Stephen Sprague
t; the table location as staging.db/foo (as you have not specified the >> location). >> >> Adding user@hive.apache.org as this is hive related. >> >> >> ~Rajesh.B >> >> On Sun, Dec 25, 2016 at 12:08 AM, Stephen Sprague <sprag...@gmail.com&g

Re: Maintaining big and complex Hive queries

2016-12-21 Thread Stephen Sprague
my 2 cents. :) as soon as you say "complex query" i would submit you've lost the upperhand and you're behind the eight-ball right off the bat. And you know this too otherwise you wouldn't have posted here. ha! i use cascading CTAS statements so that i can examine the intermediate tables.

Re: [ANNOUNCE] Apache Hive 2.1.1 Released

2016-12-08 Thread Stephen Sprague
gt; > On Dec 8, 2016, at 14:40, Stephen Sprague <sprag...@gmail.com> wrote: > > > > out of curiosity any reason why release 2.1.0 disappeared from > apache.claz.org/hive ? apologies if i missed the conversation about > it. thanks. > > > > > > &

Re: [ANNOUNCE] Apache Hive 2.1.1 Released

2016-12-08 Thread Stephen Sprague
out of curiosity any reason why release 2.1.0 disappeared from apache.claz.org/hive ? apologies if i missed the conversation about it. thanks. [image: Inline image 1] On Thu, Dec 8, 2016 at 9:58 AM, Jesus Camacho Rodriguez wrote: > The Apache Hive team is proud to

Re: s3a and hive

2016-11-15 Thread Stephen Sprague
way, I reset that back to hdfs and was inserting into an external table located in s3 and *still* got that error above much to my consternation. however, by playing with "hive.exec.stagingdir" (and reading that stackoverflow) i was able to overcome the error. YMMV. Cheers, Stephen. On Tue,

Re: s3a and hive

2016-11-15 Thread Stephen Sprague
to me hive 2.2.0 and perhaps hadoop 2.7 or 2.8 are the only chances >> of success but i'm happy to be told i'm wrong. >> >> thanks, >> Stephen. >> >> >> >> On Mon, Nov 14, 2016 at 10:25 PM, Jörn Franke <jornfra...@gmail.com> >> wrote: >> &

Re: s3a and hive

2016-11-15 Thread Stephen Sprague
ems to me hive 2.2.0 and perhaps hadoop 2.7 or 2.8 are the only chances of success but i'm happy to be told i'm wrong. thanks, Stephen. On Mon, Nov 14, 2016 at 10:25 PM, Jörn Franke <jornfra...@gmail.com> wrote: > Is it a permission issue on the folder? > > On 15 Nov 201

s3a and hive

2016-11-14 Thread Stephen Sprague
so i figured i try and set hive.metastore.warehouse.dir=s3a://bucket/hive and see what would happen. running this query: insert overwrite table omniture.hit_data_aws partition (date_key=20161113) select * from staging.hit_data_aws_ext_20161113 limit 1; yields this error: Failed with

Re: a GROUP BY that is not fully grouping

2016-11-03 Thread Stephen Sprague
ha! kinda shows how the tech stack boundaries now are getting blurred, eh? well at least for us amateurs! :o On Thu, Nov 3, 2016 at 5:00 AM, Donald Matthews wrote: > |Spark calls its SQL part HiveContext, but it is not related to this > list > > Oof, I didn't realize

Re: hiveserver2 GC overhead limit exceeded

2016-10-23 Thread Stephen Sprague
ok. i'll bite. lets see the output of this command where Hiveserver2 is running. $ ps -ef | grep -i hiveserver2 this'll show us all the command line parameters HS2 was (ultimately) invoked with. Cheers, Stephen On Sun, Oct 23, 2016 at 6:46 AM, patcharee wrote: >

hiveserver2 and KILLJOB

2016-10-05 Thread Stephen Sprague
hey guys, this is a long shot but i'll ask anyway. We're running YARN and HiveServer2 (v2.1.0) and noticing "random" kills - what looks to me - being issued by HiveServer2. we've turned DEBUG log level on for the Application Master container and see the following in the logs: 2016-10-05

Re: How do I determine a library mismatch between jdbc client and server?

2016-09-28 Thread Stephen Sprague
you might just end up using your own heuristics. if the port is "alive" (ie. you can list it via netstat or telnet to it) but you can't connect... then you got yourself a problem. kinda like a bootstrapping problem, eh? you need to connect to get the version but you can't connect if you don't

Re: Hive queries rejected under heavy load

2016-09-28 Thread Stephen Sprague
gotta start by looking at the logs and run the local client to eliminate HS2. perhaps running hive as such: $ hive -hiveconf hive.root.logger=DEBUG,console do you see any smoking gun? On Wed, Sep 28, 2016 at 7:34 AM, Jose Rozanec wrote: > Hi, > > We have a

Re: Hive 2.x usage

2016-09-14 Thread Stephen Sprague
> * Are you using Hive-2.x at your org and at what scale? yes. we're using 2.1.0. 1.5PB. 30 node cluster. ~1000 jobs a day.And yeah hive 2.1.0 has some issues and can require some finesse wrt the hive-site.xml settings. > * Is the release stable enough? Did you notice any correctness

Re: Re: load data Failed with exception java.lang.IndexOutOfBoundsException

2016-09-08 Thread Stephen Sprague
>at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat. validateInput(OrcInputFormat.java:508) would it be safe to assume that you are trying to load a text file into an table stored as ORC? your create table doesn't specify that explicitly so that means you have a setting in your configs that says

Re: hive.root.logger influencing query plan?? so it's not so

2016-09-04 Thread Stephen Sprague
ted or not for the query to hang. * so empty result expected. as Gopal mentioned previously this does indeed fix it: * set hive.fetch.task.conversion=none; but not sure its the right thing to set globally just yet. Anyhoo users beware. Regards, Stephen On Wed, Aug 31, 2016 at 7:01 AM, Stephen Sprague <

Re: Beeline throws OOM on large input query

2016-09-02 Thread Stephen Sprague
hmmm. so beeline blew up *before* the query was even submitted to the execution engine? one would think 16G would be plenty 8M row sql statement. some suggestions if you feel like going further down the rabbit hole. 1. confirm your beeline java process is indeed running with expanded memory

Re: Beeline throws OOM on large input query

2016-09-01 Thread Stephen Sprague
lemme guess. your query contains an 'in' clause with 1 million static values? :) * brute force solution is to set: HADOOP_CLIENT_OPTS=-Xmx8G (or whatever) before you run beeline to force a larger memory size (i'm pretty sure beeline uses that env var though i didn't actually check the

Re: Quota for rogue ad-hoc queries

2016-09-01 Thread Stephen Sprague
> rogue queries so this really isn't limited to just hive is it? any dbms system perhaps has to contend with this. even malicious rogue queries as a matter of fact. timeouts are cheap way systems handle this - assuming time is related to resource. i'm sure beeline or whatever client you use

Re: hive.root.logger influencing query plan?? so it's not so

2016-08-31 Thread Stephen Sprague
e of the reasons. > > Cheers, > Vlad > > --- > From: Stephen Sprague <sprag...@gmail.com> > To: "user@hive.apache.org" <user@hive.apache.org> > Cc: > Date: Tue, 30 Aug 2016 20:28:50 -0700 > Subject: hive.root.logger influencing query plan?? so

hive.root.logger influencing query plan?? so it's not so

2016-08-30 Thread Stephen Sprague
Hi guys, I've banged my head on this one all day and i need to surrender. I have a query that hangs (never returns). However, when i turn on logging to DEBUG level it works. I'm stumped. I include here the query, the different query plans (with the only thing different being the log level) and

Re: hive 2.1.0 + drop view

2016-08-26 Thread Stephen Sprague
;> >> http://talebzadehmich.wordpress.com >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explici

Re: hive 2.1.0 + drop view

2016-08-26 Thread Stephen Sprague
n of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 26 August 2016 at 20:32, St

Re: hive 2.1.0 + drop view

2016-08-26 Thread Stephen Sprague
arying VIEW_EXPANDED_TEXT | text | VIEW_ORIGINAL_TEXT | text | {quote} wonder if i can perform some surgery here. :o do i feel lucky? On Fri, Aug 26, 2016 at 12:28 PM, Stephen Sprague <sprag...@gmail.com> wrote: > well that doesn't bode well.

Re: hive 2.1.0 + drop view

2016-08-26 Thread Stephen Sprague
ction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 26 August 2016 at 16:43, Steph

Re: hive 2.1.0 + drop view

2016-08-26 Thread Stephen Sprague
thanks Gopal. you're right our metastore is using Postgres. very interesting you were able to intuit that! lemme give your suggestions a try and i'll post back. thanks! Stephen On Fri, Aug 26, 2016 at 8:32 AM, Gopal Vijayaraghavan wrote: > > NULL::character%20varying) >

hive 2.1.0 + drop view

2016-08-26 Thread Stephen Sprague
hey guys, this ones a little more strange. hive> create view foo_vw as select * from foo; OK Time taken: 0.376 seconds hive> drop view foo_vw; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.IllegalArgumentException:

Re: hive 2.1.0 and "NOT IN ( list )" and column is a partition_key

2016-08-25 Thread Stephen Sprague
can repro this on master. I’ll file a bug... > > From: Stephen Sprague <sprag...@gmail.com> > Reply-To: "user@hive.apache.org" <user@hive.apache.org> > Date: Thursday, August 25, 2016 at 13:34 > To: "user@hive.apache.org" <user@hive.apache.org&

Re: hive 2.1.0 and "NOT IN ( list )" and column is a partition_key

2016-08-25 Thread Stephen Sprague
Hi Gopal, Thank you for this insight. good stuff. The thing is there is no 'foo' for etl_database_source so that filter if anything should be short-circuited to 'true'. ie. double nots. 1. not in 2. and foo not present. it doesn't matter what what i put in that "not in" clause the filter

Re: hive throws ConcurrentModificationException when executing insert overwrite table

2016-08-17 Thread Stephen Sprague
indeed +1 to Gopal on that explanation! That was huge. On Wed, Aug 17, 2016 at 12:58 AM, 明浩 冯 wrote: > Hi Gopal, > > > It works when I disabled the dfs.namenode.acls. > > For the data loss, it doesn't affect me too much currently. But I will > track the issue in Kylin. > >

Re: JsonSerDe and mapping tweet's user structure error

2016-08-16 Thread Stephen Sprague
stackoverflow is your friend. that said have a peek at the doc even :) cf. https://cwiki.apache.org/ confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Keywords,Non- reservedKeywordsandReservedKeywords paying close attention to this paragraph: {quote} Reserved keywords are permitted

Re: hiver errors

2016-08-10 Thread Stephen Sprague
this error messages says everything you need to know: >Likely cause: new client talking to old server. Continuing without it. when you upgrade hive you also need to upgrade the metastore schema. failing to do that can trigger the message you're getting. On Wed, Aug 10, 2016 at 6:41 AM, Mich

Re: beeline/hiveserver2 + logging

2016-08-10 Thread Stephen Sprague
Hi Gopal, Aha! thank you for background behind this. that makes things much more understandable. and ~3000 queries across 10 HS2 servers. sweet. now that's what i call pushing the edge. I like it! Thanks again, Stephen. On Tue, Aug 9, 2016 at 10:29 PM, Gopal Vijayaraghavan

Re: beeline/hiveserver2 + logging

2016-08-09 Thread Stephen Sprague
well, well. i just found this: https://issues.apache.org/jira/browse/HIVE-14183 seems something changed between 1.2.1 and 2.1.0. i'll see if the Rx as prescribed in that ticket does indeed work for me. Thanks, Stephen. On Tue, Aug 9, 2016 at 5:12 PM, Stephen Sprague <sprag...@gmail.com>

beeline/hiveserver2 + logging

2016-08-09 Thread Stephen Sprague
hey guys, try as i might i cannot seem to get beeline (via jdbc) to log information back from hiveserver2 like job_id, progress and that kind of information (similiar to what the local beeline or hive clients do.) i see this ticket that is closed: https://issues.apache.org/jira/browse/HIVE-7615

Re: msck repair table and hive v2.1.0

2016-07-14 Thread Stephen Sprague
but the external table created was pointing to HDFS. Was that intentional? > > ~Rajesh.B > > On Fri, Jul 15, 2016 at 6:58 AM, Stephen Sprague <sprag...@gmail.com> > wrote: > >> in the meantime given my tables are in s3 i've written a utility to do a >> 'aws s3

Re: msck repair table and hive v2.1.0

2016-07-14 Thread Stephen Sprague
but in a non-portable way. oh well. gotta do what ya gotta do. On Wed, Jul 13, 2016 at 9:29 PM, Stephen Sprague <sprag...@gmail.com> wrote: > hey guys, > i'm using hive version 2.1.0 and i can't seem to get msck repair table to > work. no matter what i try i get the 'ol NPE. I

msck repair table and hive v2.1.0

2016-07-13 Thread Stephen Sprague
hey guys, i'm using hive version 2.1.0 and i can't seem to get msck repair table to work. no matter what i try i get the 'ol NPE. I've set the log level to 'DEBUG' but yet i still am not seeing any smoking gun. would anyone here have any pointers or suggestions to figure out what's going wrong?

Tez issues with beeline via HS2

2016-02-17 Thread Stephen Sprague
Hi guys, it was suggested i post to the user@hive group rather than the user@tez group for this one. Here's my issue. My query hangs when using beeline via HS2 (but works with the local beeline client). I'd like to overcome that. This is my query: beeline -u 'jdbc:hive2://

Re: Hive on Spark Engine versus Spark using Hive metastore

2016-02-03 Thread Stephen Sprague
i refuse to take anybody seriously who has a sig file longer than one line and that there is just plain repugnant. On Wed, Feb 3, 2016 at 1:47 PM, Mich Talebzadeh wrote: > I just did some further tests joining a 5 million rows FACT tables with 2 > DIMENSION tables. > > > >

Re: Hive job name

2015-04-07 Thread Stephen Sprague
explicitly is a workaround but... that's a boat load of code changes! Would not there be a fix to roll this back to how it got the job.name before? Thanks, Stephen Sprague On Wed, Mar 11, 2015 at 1:38 PM, Viral Bajaria viral.baja...@gmail.com wrote: I haven't used Tez but it's in my list

Re: bug in hive

2014-09-20 Thread Stephen Sprague
great policy. install open source software that's not even version 1.0 into production and then not allow the ability to improve it (but of course reap all the rewards of its benefits.) so instead of actually fixing the problem the right way introduce a super-hack work-around cuz, you know,

Re: Mysql - Hive Sync

2014-09-06 Thread Stephen Sprague
it under hive warehouse as table and query from there. *RegardsMuthupandi.K* [image: Picture (Device Independent Bitmap)] On Sat, Sep 6, 2014 at 4:47 AM, Stephen Sprague sprag...@gmail.com wrote: great find, Muthu. I would be interested in hearing any about any success or failures using

Re: Mysql - Hive Sync

2014-09-05 Thread Stephen Sprague
great find, Muthu. I would be interested in hearing any about any success or failures using this adapter. almost sounds too good to be true. After reading the blog ( http://innovating-technology.blogspot.com/2013/04/mysql-hadoop-applier-part-2.html) about it i see it comes with caveats and it

Re: Altering the Metastore on EC2

2014-08-14 Thread Stephen Sprague
i'll take a stab at this. - probably no reason. - if you can. is there a derby client s/t you can issue the command: alter table COLUMNS_V2 modify TYPE_NAME varchar(32672). otherwise maybe use the mysql or postgres metastores (instead of derby) and run that alter command after the install. -

Re: Predicate pushdown optimisation not working for ORC

2014-04-03 Thread Stephen Sprague
wow. good find. i hope these config settings are well documented and that you didn't have to spend alot time searching for that. Interesting that the default isn't true for this one. On Wed, Apr 2, 2014 at 11:00 PM, Abhay Bansal abhaybansal.1...@gmail.comwrote: I was able to resolve the issue

Re: Partitioned table to partitioned table

2014-03-26 Thread Stephen Sprague
the error message is correct. remember the partition columns are not stored with the data and by doing a select * that's what doing. And this has nothing to do with ORC either its a Hive thing. :) so your second approach was close. just omit the partition columns yr, mo, day. On Wed, Mar 26,

Re: computing median and percentiles

2014-03-20 Thread Stephen Sprague
be used to derive the percentile? Value Count 100 2 200 4 300 1 Thanks, Seema From: Stephen Sprague sprag...@gmail.com Reply-To: user@hive.apache.org user@hive.apache.org Date: Thursday, March 20, 2014 5:28 AM To: user@hive.apache.org user

Re: Improving self join time

2014-03-20 Thread Stephen Sprague
hmm. would this not fall under the general problem of identifying duplicates? Would something like this meet your needs? (untested) select -- outer query finds the ids for the duplicates key from ( -- inner query lists duplicate values select count(*) as cnt, value

Re: Improving self join time

2014-03-20 Thread Stephen Sprague
will give me a list of duplicate elements and their counts, but it loses the information as to what id had these elements. I'm trying to find which pairs of ids have any duplicate tags. On Thu, Mar 20, 2014 at 11:57 AM, Stephen Sprague sprag...@gmail.comwrote: hmm. would this not fall under

Re: Improving self join time

2014-03-20 Thread Stephen Sprague
select key from (query result that doesn't contain the key field) ... On Thu, Mar 20, 2014 at 1:28 PM, Stephen Sprague sprag...@gmail.comwrote: I agree with your assessment of the inner query. why stop there though? Doesn't the outer query fetch the ids of the tags that the inner query identified

Re: computing median and percentiles

2014-03-19 Thread Stephen Sprague
not a hive question is it? its more like a math question. On Wed, Mar 19, 2014 at 1:30 PM, Seema Datar sda...@yahoo-inc.com wrote: I understand the percentile function is supported in Hive in the latest versions. However, how does once calculate percentiles when the data is across two

Re: Hive - Sorting on the Partition Column data type Int . Output is Alphabetic Sort

2014-03-14 Thread Stephen Sprague
! On Fri, Mar 14, 2014 at 4:21 AM, Nitin Pawar nitinpawar...@gmail.comwrote: Can you first try updating hive to atleast 0.11 if you can not move to 0.12 ? On Fri, Mar 14, 2014 at 4:49 PM, Arafat, Moiz moiz.ara...@teamaol.comwrote: My comments inline *From:* Stephen Sprague [mailto:sprag

Re: Writing data to LOCAL with Hive Server2

2014-03-14 Thread Stephen Sprague
re: HiveServer2 this is not natively possible (this falls under the export rubric.) similarly, you can't load a file directly from your client using native syntax (import.) Believe me, you're not the only one who'd like this both of these functions. :) I'd search this list for import or export

Re: Writing data to LOCAL with Hive Server2

2014-03-14 Thread Stephen Sprague
, there is no way to reach those files from our boxes. That's why I was asking about writing it locally. I'll check this list for import/export like you mentioned. Thanks. On Friday, March 14, 2014 12:23 PM, Stephen Sprague sprag...@gmail.com wrote: re: HiveServer2 this is not natively

Re: Hive - Sorting on the Partition Column data type Int . Output is Alphabetic Sort

2014-03-13 Thread Stephen Sprague
/user/moiztcs/moiz_partition_test/partition_hr=2 5) hive select distinct partition_hr from moiz_partition_test order by partition_hr; OK 0 1 10 2 Thanks, Moiz *From:* Stephen Sprague [mailto:sprag...@gmail.com] *Sent:* Wednesday, March 12, 2014 9:58 PM *To:* user

Re: Hive - Sorting on the Partition Column data type Int . Output is Alphabetic Sort

2014-03-12 Thread Stephen Sprague
/user/moiztcs/moiz_partition_test/10 4) Ran the sql hive select distinct partition_hr from moiz_partition_test order by partition_hr; Ended Job OK 0 1 10 2 Thanks, Moiz *From:* Stephen Sprague [mailto:sprag...@gmail.com] *Sent:* Wednesday, March 12, 2014 12:55 AM

Re: full outer join result

2014-03-12 Thread Stephen Sprague
interesting.don't know the answer but could you change the UNION in the Postgres to UNION ALL? I'd be curious if the default is UNION DISTINCT on that platform. That would at least partially explain postgres behaviour leaving hive the odd man out. On Wed, Mar 12, 2014 at 6:47 AM, Martin

Re: full outer join result

2014-03-12 Thread Stephen Sprague
AM, Stephen Sprague sprag...@gmail.com wrote: interesting.don't know the answer but could you change the UNION in the Postgres to UNION ALL? I'd be curious if the default is UNION DISTINCT on that platform. That would at least partially explain postgres behaviour leaving hive the odd man

additional hive functions

2014-03-12 Thread Stephen Sprague
just a public service announcement. I had a case where i had a nested json array in a string and i needed that to act like a first class array in hive. natively, you can pull it out but it'll just a string. woe is me. I searched around the web and found this:

Re: Hive - Sorting on the Partition Column data type Int . Output is Alphabetic Sort

2014-03-11 Thread Stephen Sprague
that makes no sense. if the column is an int it isn't going to sort like a string. I smell a user error somewhere. On Tue, Mar 11, 2014 at 6:21 AM, Arafat, Moiz moiz.ara...@teamaol.comwrote: Hi , I have a table that has a partition column partition_hr . Data Type is int (partition_hr

Re: bucketed table problems

2014-03-07 Thread Stephen Sprague
yeah. that's not right. 1. lets see the output of show create table foo 2. what version of hive are you using. On Fri, Mar 7, 2014 at 11:46 AM, Keith Wiley kwi...@keithwiley.com wrote: I want to convert a table to a bucketed table, so I made a new table with the same schema as the old table

Re: bucketed table problems

2014-03-07 Thread Stephen Sprague
short answer: its by position.

Re: HIVE QUERY HELP:: HOW TO IMPLEMENT THIS CASE

2014-03-04 Thread Stephen Sprague
Let's just say this. Coercing hive into doing something its not meant to do is kinda a waste of time. Sure you can rewrite any update as a delete/insert but that's not the point of Hive. Seems like your going down a path here that's not optimal for your situation. You know, I could buy a Tesla

Re: HIVE QUERY HELP:: HOW TO IMPLEMENT THIS CASE

2014-03-04 Thread Stephen Sprague
on 'AGE' since its part of the where clause in the -- derived table. {code} i switched your ON clause and WHERE clause so be sure to take that under consideration. And finally its not tested. Best of luck. Cheers, Stephen On Tue, Mar 4, 2014 at 7:49 AM, Stephen Sprague sprag...@gmail.com wrote

Re: move hive tables from one cluster to another cluster

2014-02-28 Thread Stephen Sprague
this is a FAQ. see doc on: msck repair table table this will scan hdfs and create the corresponding partitions in the metastore. On Fri, Feb 28, 2014 at 12:59 AM, shashwat shriparv dwivedishash...@gmail.com wrote: Where was your meta data in derby or MySql? *Warm Regards_**∞_* *

Re: Hive + Flume

2014-02-28 Thread Stephen Sprague
if you can configure flume to create temporary files that start with an underscore (_) i believe hive will safely ignore them. otherwise you have write a script to move them out. On Fri, Feb 28, 2014 at 11:09 AM, P lva ruvi...@gmail.com wrote: Hi, I'm have a flume stream that stores data in

Re: move hive tables from one cluster to another cluster

2014-02-28 Thread Stephen Sprague
that advice is way over complicating something that is very easy. instead, please take this approach. 1. run the ddl to create the table on the new cluster 2. distcp the hdfs data into the appropriate hdfs directory. 3. run msck repair table table in hive to discover the partitions and populate

Re: Metastore performance on HDFS-backed table with 15000+ partitions

2014-02-22 Thread Stephen Sprague
yeah. That traceback pretty much spells it out - its metastore related and that's where the partitions are stored. I'm with the others on this. HiveServer2 is still a little jankey on memory management. I bounce mine once a day at midnight just to play it safe (and because i can.) Again, for

Re: Metastore performance on HDFS-backed table with 15000+ partitions

2014-02-21 Thread Stephen Sprague
most interesting. we had an issue recently with querying a table with 15K columns and running out of heap storage but not 15K partitions. 15K partitions shouldn't be causing a problem in my humble estimation. Maybe a million but not 15K. :) So is there a traceback we can look at? or its not

Re: Slow performance on queries with aggregation function

2014-02-21 Thread Stephen Sprague
Hi Jone, um. i can say for sure something is wrong. :) i would _start_ by going to the tasktracker. this is your friend. find your job and look for failed reducers. That's the starting point anyway, IMHO. On Fri, Feb 21, 2014 at 11:35 AM, Jone Lura jone.l...@ecc.no wrote: Hi, I have

Re: Issue with Hive and table with lots of column

2014-02-19 Thread Stephen Sprague
. On Tue, Feb 18, 2014 at 10:57 AM, David Gayou david.ga...@kxen.com wrote: Sorry i badly reported it. It's 8192M Thanks, David. Le 18 févr. 2014 18:37, Stephen Sprague sprag...@gmail.com a écrit : oh. i just noticed the -Xmx value you reported. there's no M or G after that number?? I'd like

Re: Issue with Hive and table with lots of column

2014-02-18 Thread Stephen Sprague
-0.13, HiveServer2 takes less memory than before. Could you try it with the version in trunk? 2014-02-13 10:49 GMT+09:00 Stephen Sprague sprag...@gmail.com: question to the original poster. closure appreciated! On Fri, Jan 31, 2014 at 12:22 PM, Stephen Sprague sprag...@gmail.comwrote

Re: Hive Query :: Implementing case statement

2014-02-18 Thread Stephen Sprague
maybe consider something along these lines. nb. not tested. -- temp table holding new balances + key create table NEW_BALS as select * from ( select b.prev as NEW_BALANCE, a.key from TABLE_SQL a join TABLE_SQL_2 b on (a.key=b.key) where a.code='1'; UNION ALL select b.prev as

Re: Issue with Hive and table with lots of column

2014-02-18 Thread Stephen Sprague
return some pretty long command with a -Xmx8192 and that's the value set in hive-env.sh 2. The select * from table limit 1 or even 100 is working correctly. David. On Tue, Feb 18, 2014 at 4:16 PM, Stephen Sprague sprag...@gmail.comwrote: He lives on after all! and thanks for the continued

Re: Issue with Hive and table with lots of column

2014-02-18 Thread Stephen Sprague
oh. i just noticed the -Xmx value you reported. there's no M or G after that number?? I'd like to see -Xmx8192M or -Xmx8G. That *is* very important. thanks, Stephen. On Tue, Feb 18, 2014 at 9:22 AM, Stephen Sprague sprag...@gmail.com wrote: thanks. re #1. we need to find

Re: Views and partitions and performance

2014-02-11 Thread Stephen Sprague
great questions, Burak. Personally, I had not before seen the create view ... partition on construct. Not that that means anything but thanks for bringing bringing out into the forefront! So, yeah, do we have an SME out there that would like to elaborate on this beyond the aforementioned url?

Re: FUNCTION HIVE to DAYS OF WEEK

2014-02-10 Thread Stephen Sprague
oddly enough i don't see one here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions however, you're not the only one finding something like this useful. cf. https://issues.apache.org/jira/browse/HIVE-6046 in the meantime it appears as though

  1   2   >