Re: Hive splits/adds rows when outputting dataset with new lines
Try with set hive.default.fileformat=SequenceFile; Thanks, Navis 2014-10-06 20:51 GMT+09:00 Maciek mac...@sonra.io: Hello, I've encountered a situation when printing new lines corrupts (multiplies) the returned dataset. This seem to be similar to HIVE-3012 https://issues.apache.org/jira/browse/HIVE-3012 (fixed on 0.11), but as I'm on Hive 0.13 it's still the case. Here are the steps to illustrate/reproduce: 1. Fist let'e create table with one row and one column by selecting from any existing table (substitute ANYTABLE respecitvely): CREATE TABLE singlerow AS SELECT 'worldofhostels' wordsmerged FROM ANYTABLE LIMIT 1; and verify: SELECT * FROM singlerow; OK--- worldofhostels Time taken: 0.028 seconds, Fetched: 1 row(s) All good so far. 2. Now let's introduce newline here by: SELECT regexp_replace(wordsmerged,'of',\nof\n) wordsseparate FROM singlerow; OK-- world of hostels Time taken: 6.404 seconds, Fetched: 3 row(s) and I'm suddenly getting 3 rows now. 3. This is not just for CLI output as when submitting CTAS, it materializes such corrupted result set: CREATE TABLE corrupted AS SELECT regexp_replace(wordsmerged,'of',\nof\n) wordsseparate, wordsmerged FROM singlerow; hive select * from corrupted; OK world NULL of NULL hostels worldofhostels Time taken: 0.029 seconds, Fetched: 3 row(s) Apparently, the same happens - new table is split into multiple rows with columns following the one in question (like wordsmerged) become NULLs Am i doing something wrong here? Regards, Maciek
Re: HiveException: Stateful expressions cannot be used inside of CASE
Stateful function should be called for all input rows but in if/when clause, it cannot be guaranteed. Any reason to declare protect_column function to be stateful? Thanks, Navis 2014-09-25 3:42 GMT+09:00 Dan Fan d...@appnexus.com: Hi Hive Users: I have a hive generic hive udf which is called protect_column. The udf works fine when I call it along. But when I run the following query: select case when id = 5 then protect_column(id, 'age', 12L) else id end from one_row_table ; It says Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Stateful expressions cannot be used inside of CASE I was reading the source code. And I think it is related to GenericCase and GenericWhen according to https://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java?p=1197837 Could anyone help explain explicitly what exactly is genericCase GenericWhen and why we cannot put the udf inside a case when ? Thanks for your time helping me out Best Dan
Hive Full embedded for test
Hi, I need to test hive request in a full local mode. For the metastore, no problem to start in an embedded mode. I'm using hive jdbc. My problem is that the ExecDriver try to launch /usr/bin/hadoop. is there a way to launch in a full embedded mode? (from a provided jar)
RE: Hive Join returns incorrect results on bigint=string
Based on this wiki page: https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-TypeSystem The string will do a implicit conversion to double, as Double is the only common ancestor between bigint and string. So the result is unpredictable if you are talking about double. Yong Date: Mon, 6 Oct 2014 14:20:57 -0700 Subject: Hive Join returns incorrect results on bigint=string From: a...@rocketfuelinc.com To: user@hive.apache.org Recently, by mistake, I encountered a situation where I ended up doing a join key comparison between a string and a bigint. The returned results are incorrect even though the strings have exactly same integer values as the bigint values. When I do a Join on bigint = cast(string as bigint), the results are correct. Is this the expected behavior, or Hive is supposed to do an automatic cast and compare as strings? -- Thanks and Regards,Ashu PachauriRocket Scientist,Rocket Fuel Inc.1- 650 - 200- 5390
NULL DEFINED AS
I see there is a NULL DEFINED AS on row_format for a table. Is there a way to set NULL DEFINED AS at a database level? This would make maintenance much easier and less human error yet the table NULL DEFINED AS on the row_format would out weigh the database setting for the table in question. -- Those who say it can't be done, are usually interrupted by those doing it.
Re: Hive Full embedded for test
Hi, I know this project. I want to know if it's possible to run locally without setting hadoop home ... 2014-10-07 15:30 GMT+02:00 Edward Capriolo edlinuxg...@gmail.com: Check out. https://github.com/edwardcapriolo/hive_test On Tuesday, October 7, 2014, Yoel Benharrous yoel.benharr...@gmail.com wrote: Hi, I need to test hive request in a full local mode. For the metastore, no problem to start in an embedded mode. I'm using hive jdbc. My problem is that the ExecDriver try to launch /usr/bin/hadoop. is there a way to launch in a full embedded mode? (from a provided jar) -- Sorry this was sent from mobile. Will do less grammar and spell check than usual.
Re: Hive Full embedded for test
No. It is buried deep in hive's guts to fork bin/hadoop :) On Tue, Oct 7, 2014 at 10:16 AM, Yoel Benharrous yoel.benharr...@gmail.com wrote: Hi, I know this project. I want to know if it's possible to run locally without setting hadoop home ... 2014-10-07 15:30 GMT+02:00 Edward Capriolo edlinuxg...@gmail.com: Check out. https://github.com/edwardcapriolo/hive_test On Tuesday, October 7, 2014, Yoel Benharrous yoel.benharr...@gmail.com wrote: Hi, I need to test hive request in a full local mode. For the metastore, no problem to start in an embedded mode. I'm using hive jdbc. My problem is that the ExecDriver try to launch /usr/bin/hadoop. is there a way to launch in a full embedded mode? (from a provided jar) -- Sorry this was sent from mobile. Will do less grammar and spell check than usual.
Re: Hive Full embedded for test
It was my first feeling browsing the source code ... :( My Windows friends will be condemned to switch to Linux. 2014-10-07 16:54 GMT+02:00 Edward Capriolo edlinuxg...@gmail.com: No. It is buried deep in hive's guts to fork bin/hadoop :) On Tue, Oct 7, 2014 at 10:16 AM, Yoel Benharrous yoel.benharr...@gmail.com wrote: Hi, I know this project. I want to know if it's possible to run locally without setting hadoop home ... 2014-10-07 15:30 GMT+02:00 Edward Capriolo edlinuxg...@gmail.com: Check out. https://github.com/edwardcapriolo/hive_test On Tuesday, October 7, 2014, Yoel Benharrous yoel.benharr...@gmail.com wrote: Hi, I need to test hive request in a full local mode. For the metastore, no problem to start in an embedded mode. I'm using hive jdbc. My problem is that the ExecDriver try to launch /usr/bin/hadoop. is there a way to launch in a full embedded mode? (from a provided jar) -- Sorry this was sent from mobile. Will do less grammar and spell check than usual.
Re: Hive Full embedded for test
The only other way to accomplish this is building .q tests inside hive source code. In the end I believe that still forks a hadoop but it does all the hardwark of downloading etc. I made hive-test because running a hive q tests takes way to long INHO. On Tue, Oct 7, 2014 at 12:12 PM, Yoel Benharrous yoel.benharr...@gmail.com wrote: It was my first feeling browsing the source code ... :( My Windows friends will be condemned to switch to Linux. 2014-10-07 16:54 GMT+02:00 Edward Capriolo edlinuxg...@gmail.com: No. It is buried deep in hive's guts to fork bin/hadoop :) On Tue, Oct 7, 2014 at 10:16 AM, Yoel Benharrous yoel.benharr...@gmail.com wrote: Hi, I know this project. I want to know if it's possible to run locally without setting hadoop home ... 2014-10-07 15:30 GMT+02:00 Edward Capriolo edlinuxg...@gmail.com: Check out. https://github.com/edwardcapriolo/hive_test On Tuesday, October 7, 2014, Yoel Benharrous yoel.benharr...@gmail.com wrote: Hi, I need to test hive request in a full local mode. For the metastore, no problem to start in an embedded mode. I'm using hive jdbc. My problem is that the ExecDriver try to launch /usr/bin/hadoop. is there a way to launch in a full embedded mode? (from a provided jar) -- Sorry this was sent from mobile. Will do less grammar and spell check than usual.
Re: Hive Full embedded for test
Or perhaps to patch Hive to avoid the use of /usr/bin/hadoop when local mode is selected. 2014-10-07 18:16 GMT+02:00 Edward Capriolo edlinuxg...@gmail.com: The only other way to accomplish this is building .q tests inside hive source code. In the end I believe that still forks a hadoop but it does all the hardwark of downloading etc. I made hive-test because running a hive q tests takes way to long INHO. On Tue, Oct 7, 2014 at 12:12 PM, Yoel Benharrous yoel.benharr...@gmail.com wrote: It was my first feeling browsing the source code ... :( My Windows friends will be condemned to switch to Linux. 2014-10-07 16:54 GMT+02:00 Edward Capriolo edlinuxg...@gmail.com: No. It is buried deep in hive's guts to fork bin/hadoop :) On Tue, Oct 7, 2014 at 10:16 AM, Yoel Benharrous yoel.benharr...@gmail.com wrote: Hi, I know this project. I want to know if it's possible to run locally without setting hadoop home ... 2014-10-07 15:30 GMT+02:00 Edward Capriolo edlinuxg...@gmail.com: Check out. https://github.com/edwardcapriolo/hive_test On Tuesday, October 7, 2014, Yoel Benharrous yoel.benharr...@gmail.com wrote: Hi, I need to test hive request in a full local mode. For the metastore, no problem to start in an embedded mode. I'm using hive jdbc. My problem is that the ExecDriver try to launch /usr/bin/hadoop. is there a way to launch in a full embedded mode? (from a provided jar) -- Sorry this was sent from mobile. Will do less grammar and spell check than usual.
Templeton API- No WaitForJobToComplete.
Hey Guys, I am trying to submit Hive Jobs using Templeton API. I am using this Templeton for the first time. How should i wait for the TempletonController job to be completed and exit the job? Unlike WebHCatHttpClient, Templeton doesnt have WaitForJobToComplete. Please let me how do you deal with this.? Regards, Karthik
Hive Read from Reducer : Is it advisable ?
Hi, We are making Hive read to few files in HDFS from Reducer(s) part of a map-reduce job, This works good when launching few reducers ( say 5). But when we launch more that, the initial connection to Hive Server2 takes longer time( around 10 mins ). We have configured hive-site.xml to allow parallel execution . 1. Is this advisable to read HDFS data via hive from reducers. ? or what are the best practices for this scenario ? 2. Is there a way increase hive concurrent access performance Regards, Malli
Re: Weird Error on Inserting in Table [ORC, MESOS, HIVE]
hi I faced a similar situation in my dev cluster CDH distribution 5.1.3 See the thread details with log files https://groups.google.com/a/cloudera.org/forum/#!mydiscussions/scm-users/MpcpHj5mWT8 thanks sanjay From: John Omernik j...@omernik.com To: user@hive.apache.org Sent: Tuesday, September 9, 2014 12:10 PM Subject: Re: Weird Error on Inserting in Table [ORC, MESOS, HIVE] Well, here is me talking to myself: but in case someone else runs across this, I changed the hive metastore connect timeout to 600 seconds (per the JIRA below for Hive 0.14) and now my problem has gone away. It looks like the timeout was causing some craziness. https://issues.apache.org/jira/browse/HIVE-7140 On Tue, Sep 9, 2014 at 1:00 PM, John Omernik j...@omernik.com wrote: I ran with debug logging, and this is interesting, there was a loss of connection to the metastore client RIGHT before the partition mention above... as data was looking to be moved around... I wonder if the timing on that is bad? 14/09/09 12:47:37 [main]: INFO exec.MoveTask: Partition is: {day=null, source=null}14/09/09 12:47:38 [main]: INFO metadata.Hive: Renaming src:maprfs:/user/hive/scratch/hive-mapr/hive_2014-09-09_12-38-30_860_3555291990145206535-1/-ext-1/day=2012-11-30/source=20121119_SWAirlines_Spam/04_0;dest: maprfs:/user/hive/warehouse/intel_flow.db/pcaps/day=2012-11-30/source=20121119_SWAirlines_Spam/04_0;Status:true14/09/09 12:48:02 [main]: WARN metastore.RetryingMetaStoreClient: MetaStoreClient lost connection. Attempting to reconnect.org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) On Tue, Sep 9, 2014 at 11:02 AM, John Omernik j...@omernik.com wrote: I am doing a dynamic partition load in Hive 0.13 using ORC files. This has always worked in the past both with MapReduce V1 and YARN. I am working with Mesos now, and trying to trouble shoot this weird error: Failed with exception AlreadyExistsException(message:Partition already exists What's odd is is my insert is an insert (without Overwrite) so it's like two different reducers have data to go into the same partition, but then there is a collision of some sort? Perhaps there is a situation where the partition doesn't exist prior to the run, but when two reducers have data, they both think they should be the one to create the partition? Shouldn't if a partition already exists, the reducer just copies it's file into the partition? I am struggling to see why this would be an issue with Mesos, but not on Yarn, or MRv1. Any thoughts would be welcome. John
Re: Hive splits/adds rows when outputting dataset with new lines
This …works! quite surprised as per the steps I outlined, the issue manifested even without CTAS (regular SELECT) still don't see how could that be related …or those are two separate issues? Also, maybe you know - is there any way to make it work for TextFile? Thank you, Maciek On Tue, Oct 7, 2014 at 7:13 AM, Navis류승우 navis@nexr.com wrote: Try with set hive.default.fileformat=SequenceFile; Thanks, Navis 2014-10-06 20:51 GMT+09:00 Maciek mac...@sonra.io: Hello, I've encountered a situation when printing new lines corrupts (multiplies) the returned dataset. This seem to be similar to HIVE-3012 https://issues.apache.org/jira/browse/HIVE-3012 (fixed on 0.11), but as I'm on Hive 0.13 it's still the case. Here are the steps to illustrate/reproduce: 1. Fist let'e create table with one row and one column by selecting from any existing table (substitute ANYTABLE respecitvely): CREATE TABLE singlerow AS SELECT 'worldofhostels' wordsmerged FROM ANYTABLE LIMIT 1; and verify: SELECT * FROM singlerow; OK--- worldofhostels Time taken: 0.028 seconds, Fetched: 1 row(s) All good so far. 2. Now let's introduce newline here by: SELECT regexp_replace(wordsmerged,'of',\nof\n) wordsseparate FROM singlerow; OK-- world of hostels Time taken: 6.404 seconds, Fetched: 3 row(s) and I'm suddenly getting 3 rows now. 3. This is not just for CLI output as when submitting CTAS, it materializes such corrupted result set: CREATE TABLE corrupted AS SELECT regexp_replace(wordsmerged,'of',\nof\n) wordsseparate, wordsmerged FROM singlerow; hive select * from corrupted; OK world NULL of NULL hostels worldofhostels Time taken: 0.029 seconds, Fetched: 3 row(s) Apparently, the same happens - new table is split into multiple rows with columns following the one in question (like wordsmerged) become NULLs Am i doing something wrong here? Regards, Maciek -- Kind Regards Maciek Kocon