Re: Hive splits/adds rows when outputting dataset with new lines

2014-10-07 Thread Navis류승우
Try with set hive.default.fileformat=SequenceFile;

Thanks,
Navis

2014-10-06 20:51 GMT+09:00 Maciek mac...@sonra.io:

 Hello,

 I've encountered a situation when printing new lines corrupts (multiplies)
 the returned dataset.
 This seem to be similar to HIVE-3012
 https://issues.apache.org/jira/browse/HIVE-3012 (fixed on 0.11), but as
 I'm on Hive 0.13 it's still the case.
 Here are the steps to illustrate/reproduce:

 1. Fist let'e create table with one row and one column by selecting from
 any existing table (substitute ANYTABLE respecitvely):

 CREATE TABLE singlerow AS SELECT 'worldofhostels' wordsmerged FROM
 ANYTABLE LIMIT 1;

 and verify:

 SELECT * FROM singlerow;

 OK---
 worldofhostels

 Time taken: 0.028 seconds, Fetched: 1 row(s)

 All good so far.
 2. Now let's introduce newline here by:

 SELECT regexp_replace(wordsmerged,'of',\nof\n) wordsseparate FROM
 singlerow;

 OK--

 world
 of
 hostels

 Time taken: 6.404 seconds, Fetched: 3 row(s)
 and I'm suddenly getting 3 rows now.
 3. This is not just for CLI output as when submitting CTAS, it
 materializes such corrupted result set:

 CREATE TABLE corrupted AS
 SELECT regexp_replace(wordsmerged,'of',\nof\n) wordsseparate,
 wordsmerged FROM singlerow;

 hive select * from corrupted;

 OK

 world NULL
 of NULL
 hostels worldofhostels

 Time taken: 0.029 seconds, Fetched: 3 row(s)
 Apparently, the same happens - new table is split into multiple rows with
 columns following the one in question (like wordsmerged) become NULLs
 Am i doing something wrong here?

 Regards,
 Maciek



Re: HiveException: Stateful expressions cannot be used inside of CASE

2014-10-07 Thread Navis류승우
Stateful function should be called for all input rows but in if/when
clause, it cannot be guaranteed.

Any reason to declare protect_column function to be stateful?

Thanks,
Navis

2014-09-25 3:42 GMT+09:00 Dan Fan d...@appnexus.com:

  Hi Hive Users:

  I have a hive generic hive udf which is called protect_column.
 The udf works fine when I call it along.
 But when I run the following query:


   select case when id = 5 then protect_column(id, 'age', 12L) else id end
 from one_row_table ;


  It says


  Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Stateful
 expressions cannot be used inside of CASE


  I was reading the source code. And I think it is related to GenericCase
 and GenericWhen according to


 https://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java?p=1197837

  Could anyone help explain explicitly what exactly is genericCase
 GenericWhen and why we cannot put the udf inside a case when ?

  Thanks for your time helping me out

  Best

  Dan



Hive Full embedded for test

2014-10-07 Thread Yoel Benharrous
Hi,

I need to test hive request in a full local mode.

For the metastore, no problem to start in an embedded mode.
I'm using hive jdbc.

My problem is that the ExecDriver try to launch /usr/bin/hadoop.

is there a way to launch in a full embedded mode? (from a provided jar)


RE: Hive Join returns incorrect results on bigint=string

2014-10-07 Thread java8964
Based on this wiki page:
https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-TypeSystem
The string will do a implicit conversion to double, as Double is the only 
common ancestor between bigint and string.
So the result is unpredictable if you are talking about double.
Yong

Date: Mon, 6 Oct 2014 14:20:57 -0700
Subject: Hive Join returns incorrect results on bigint=string
From: a...@rocketfuelinc.com
To: user@hive.apache.org

Recently, by mistake, I encountered a situation where I ended up doing a join 
key comparison between a string and a bigint. The returned results are 
incorrect  even though the strings have exactly same integer values as the 
bigint values.
When I do a Join on bigint = cast(string as bigint), the results are correct. 
Is this the expected behavior, or Hive is supposed to do an automatic cast and 
compare as strings?  
-- 
Thanks and Regards,Ashu PachauriRocket Scientist,Rocket Fuel Inc.1- 650 - 200- 
5390
  

NULL DEFINED AS

2014-10-07 Thread Al Pivonka
I see there is a NULL DEFINED AS on row_format for a table.
Is there a way to set NULL DEFINED AS at a database level?

This would make maintenance much easier and less human error yet the table
NULL DEFINED AS on the row_format would out weigh the database setting
for the table in question.




-- 
Those who say it can't be done, are usually interrupted by those doing it.


Re: Hive Full embedded for test

2014-10-07 Thread Yoel Benharrous
Hi,

I know this project. I want to know if it's possible to run locally without
setting hadoop home ...

2014-10-07 15:30 GMT+02:00 Edward Capriolo edlinuxg...@gmail.com:

 Check out.
 https://github.com/edwardcapriolo/hive_test

 On Tuesday, October 7, 2014, Yoel Benharrous yoel.benharr...@gmail.com
 wrote:

 Hi,

 I need to test hive request in a full local mode.

 For the metastore, no problem to start in an embedded mode.
 I'm using hive jdbc.

 My problem is that the ExecDriver try to launch /usr/bin/hadoop.

 is there a way to launch in a full embedded mode? (from a provided jar)










 --
 Sorry this was sent from mobile. Will do less grammar and spell check than
 usual.



Re: Hive Full embedded for test

2014-10-07 Thread Edward Capriolo
No. It is buried deep in hive's guts to fork bin/hadoop :)

On Tue, Oct 7, 2014 at 10:16 AM, Yoel Benharrous yoel.benharr...@gmail.com
wrote:

 Hi,

 I know this project. I want to know if it's possible to run locally
 without setting hadoop home ...

 2014-10-07 15:30 GMT+02:00 Edward Capriolo edlinuxg...@gmail.com:

 Check out.
 https://github.com/edwardcapriolo/hive_test

 On Tuesday, October 7, 2014, Yoel Benharrous yoel.benharr...@gmail.com
 wrote:

 Hi,

 I need to test hive request in a full local mode.

 For the metastore, no problem to start in an embedded mode.
 I'm using hive jdbc.

 My problem is that the ExecDriver try to launch /usr/bin/hadoop.

 is there a way to launch in a full embedded mode? (from a provided jar)










 --
 Sorry this was sent from mobile. Will do less grammar and spell check
 than usual.





Re: Hive Full embedded for test

2014-10-07 Thread Yoel Benharrous
It was my first feeling browsing the  source code ... :(
My Windows friends will be condemned to switch to Linux.

2014-10-07 16:54 GMT+02:00 Edward Capriolo edlinuxg...@gmail.com:

 No. It is buried deep in hive's guts to fork bin/hadoop :)

 On Tue, Oct 7, 2014 at 10:16 AM, Yoel Benharrous 
 yoel.benharr...@gmail.com wrote:

 Hi,

 I know this project. I want to know if it's possible to run locally
 without setting hadoop home ...

 2014-10-07 15:30 GMT+02:00 Edward Capriolo edlinuxg...@gmail.com:

 Check out.
 https://github.com/edwardcapriolo/hive_test

 On Tuesday, October 7, 2014, Yoel Benharrous yoel.benharr...@gmail.com
 wrote:

 Hi,

 I need to test hive request in a full local mode.

 For the metastore, no problem to start in an embedded mode.
 I'm using hive jdbc.

 My problem is that the ExecDriver try to launch /usr/bin/hadoop.

 is there a way to launch in a full embedded mode? (from a provided jar)










 --
 Sorry this was sent from mobile. Will do less grammar and spell check
 than usual.






Re: Hive Full embedded for test

2014-10-07 Thread Edward Capriolo
The only other way to accomplish this is building .q tests inside hive
source code. In the end I believe that still forks a hadoop but it does all
the hardwark of downloading etc. I made hive-test because running a hive q
tests takes way to long INHO.

On Tue, Oct 7, 2014 at 12:12 PM, Yoel Benharrous yoel.benharr...@gmail.com
wrote:

 It was my first feeling browsing the  source code ... :(
 My Windows friends will be condemned to switch to Linux.

 2014-10-07 16:54 GMT+02:00 Edward Capriolo edlinuxg...@gmail.com:

 No. It is buried deep in hive's guts to fork bin/hadoop :)

 On Tue, Oct 7, 2014 at 10:16 AM, Yoel Benharrous 
 yoel.benharr...@gmail.com wrote:

 Hi,

 I know this project. I want to know if it's possible to run locally
 without setting hadoop home ...

 2014-10-07 15:30 GMT+02:00 Edward Capriolo edlinuxg...@gmail.com:

 Check out.
 https://github.com/edwardcapriolo/hive_test

 On Tuesday, October 7, 2014, Yoel Benharrous yoel.benharr...@gmail.com
 wrote:

 Hi,

 I need to test hive request in a full local mode.

 For the metastore, no problem to start in an embedded mode.
 I'm using hive jdbc.

 My problem is that the ExecDriver try to launch /usr/bin/hadoop.

 is there a way to launch in a full embedded mode? (from a provided jar)










 --
 Sorry this was sent from mobile. Will do less grammar and spell check
 than usual.







Re: Hive Full embedded for test

2014-10-07 Thread Yoel Benharrous
Or perhaps to patch Hive to avoid the use of /usr/bin/hadoop when local
mode is selected.


2014-10-07 18:16 GMT+02:00 Edward Capriolo edlinuxg...@gmail.com:

 The only other way to accomplish this is building .q tests inside hive
 source code. In the end I believe that still forks a hadoop but it does all
 the hardwark of downloading etc. I made hive-test because running a hive q
 tests takes way to long INHO.

 On Tue, Oct 7, 2014 at 12:12 PM, Yoel Benharrous 
 yoel.benharr...@gmail.com wrote:

 It was my first feeling browsing the  source code ... :(
 My Windows friends will be condemned to switch to Linux.

 2014-10-07 16:54 GMT+02:00 Edward Capriolo edlinuxg...@gmail.com:

 No. It is buried deep in hive's guts to fork bin/hadoop :)

 On Tue, Oct 7, 2014 at 10:16 AM, Yoel Benharrous 
 yoel.benharr...@gmail.com wrote:

 Hi,

 I know this project. I want to know if it's possible to run locally
 without setting hadoop home ...

 2014-10-07 15:30 GMT+02:00 Edward Capriolo edlinuxg...@gmail.com:

 Check out.
 https://github.com/edwardcapriolo/hive_test

 On Tuesday, October 7, 2014, Yoel Benharrous 
 yoel.benharr...@gmail.com wrote:

 Hi,

 I need to test hive request in a full local mode.

 For the metastore, no problem to start in an embedded mode.
 I'm using hive jdbc.

 My problem is that the ExecDriver try to launch /usr/bin/hadoop.

 is there a way to launch in a full embedded mode? (from a provided
 jar)










 --
 Sorry this was sent from mobile. Will do less grammar and spell check
 than usual.








Templeton API- No WaitForJobToComplete.

2014-10-07 Thread karthik Srivasthava
Hey Guys,

I am trying to submit Hive Jobs using Templeton API. I am using this
Templeton for the first time.

How should i wait for the TempletonController job to be completed and exit
the job?
Unlike  WebHCatHttpClient,
Templeton doesnt have WaitForJobToComplete.

Please let me how do you deal with this.?

Regards,
Karthik


Hive Read from Reducer : Is it advisable ?

2014-10-07 Thread Sundaramoorthy, Malliyanathan
Hi,
We are making Hive read to few files in HDFS from Reducer(s) part of a 
map-reduce job,

This works good when launching few reducers ( say 5). But when we launch more 
that, the initial connection to Hive Server2 takes longer time( around 10 mins 
).
We have configured hive-site.xml to allow parallel execution .


1.  Is this advisable to read HDFS data via hive from reducers. ? or what 
are the best practices for this scenario ?

2.  Is there a way increase hive concurrent access performance



Regards,
Malli





Re: Weird Error on Inserting in Table [ORC, MESOS, HIVE]

2014-10-07 Thread Sanjay Subramanian
hi 
I faced a similar situation in my dev cluster CDH distribution 5.1.3
See the thread details with log files  
https://groups.google.com/a/cloudera.org/forum/#!mydiscussions/scm-users/MpcpHj5mWT8


thanks
sanjay  From: John Omernik j...@omernik.com
 To: user@hive.apache.org 
 Sent: Tuesday, September 9, 2014 12:10 PM
 Subject: Re: Weird Error on Inserting in Table [ORC, MESOS, HIVE]
   
Well, here is me talking to myself: but in case someone else runs across this, 
I changed the hive metastore connect timeout to 600 seconds (per the JIRA below 
for Hive 0.14) and now my problem has gone away. It looks like the timeout was 
causing some craziness. 
https://issues.apache.org/jira/browse/HIVE-7140





On Tue, Sep 9, 2014 at 1:00 PM, John Omernik j...@omernik.com wrote:

I ran with debug logging, and this is interesting, there was a loss of 
connection to the metastore client RIGHT before the partition mention above... 
as data was looking to be moved around... I wonder if the timing on that is bad?
14/09/09 12:47:37 [main]: INFO exec.MoveTask: Partition is: {day=null, 
source=null}14/09/09 12:47:38 [main]: INFO metadata.Hive: Renaming 
src:maprfs:/user/hive/scratch/hive-mapr/hive_2014-09-09_12-38-30_860_3555291990145206535-1/-ext-1/day=2012-11-30/source=20121119_SWAirlines_Spam/04_0;dest:
 
maprfs:/user/hive/warehouse/intel_flow.db/pcaps/day=2012-11-30/source=20121119_SWAirlines_Spam/04_0;Status:true14/09/09
 12:48:02 [main]: WARN metastore.RetryingMetaStoreClient: MetaStoreClient lost 
connection. Attempting to 
reconnect.org.apache.thrift.transport.TTransportException: 
java.net.SocketTimeoutException: Read timed out at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)


On Tue, Sep 9, 2014 at 11:02 AM, John Omernik j...@omernik.com wrote:

I am doing a dynamic partition load in Hive 0.13 using ORC files. This has 
always worked in the past both with MapReduce V1 and YARN. I am working with 
Mesos now, and trying to trouble shoot this weird error:


Failed with exception AlreadyExistsException(message:Partition already exists

What's odd is is my insert is an insert (without Overwrite) so it's like two 
different reducers have data to go into the same partition, but then there is a 
collision of some sort? Perhaps there is a situation where the partition 
doesn't exist prior to the run, but when two reducers have data, they both 
think they should be the one to create the partition? Shouldn't if a partition 
already exists, the reducer just copies it's file into the partition?  I am 
struggling to see why this would be an issue with Mesos, but not on Yarn, or 
MRv1.
Any thoughts would be welcome. 
John





  

Re: Hive splits/adds rows when outputting dataset with new lines

2014-10-07 Thread Maciek
This …works!
quite surprised as per the steps I outlined, the issue manifested even
without CTAS (regular SELECT)
still don't see how could that be related …or those are two separate issues?

Also, maybe you know - is there any way to make it work for TextFile?
Thank you,
Maciek

On Tue, Oct 7, 2014 at 7:13 AM, Navis류승우 navis@nexr.com wrote:

 Try with set hive.default.fileformat=SequenceFile;

 Thanks,
 Navis

 2014-10-06 20:51 GMT+09:00 Maciek mac...@sonra.io:

 Hello,

 I've encountered a situation when printing new lines corrupts
 (multiplies) the returned dataset.
 This seem to be similar to HIVE-3012
 https://issues.apache.org/jira/browse/HIVE-3012 (fixed on 0.11), but
 as I'm on Hive 0.13 it's still the case.
 Here are the steps to illustrate/reproduce:

 1. Fist let'e create table with one row and one column by selecting from
 any existing table (substitute ANYTABLE respecitvely):

 CREATE TABLE singlerow AS SELECT 'worldofhostels' wordsmerged FROM
 ANYTABLE LIMIT 1;

 and verify:

 SELECT * FROM singlerow;

 OK---
 worldofhostels

 Time taken: 0.028 seconds, Fetched: 1 row(s)

 All good so far.
 2. Now let's introduce newline here by:

 SELECT regexp_replace(wordsmerged,'of',\nof\n) wordsseparate FROM
 singlerow;

 OK--

 world
 of
 hostels

 Time taken: 6.404 seconds, Fetched: 3 row(s)
 and I'm suddenly getting 3 rows now.
 3. This is not just for CLI output as when submitting CTAS, it
 materializes such corrupted result set:

 CREATE TABLE corrupted AS
 SELECT regexp_replace(wordsmerged,'of',\nof\n) wordsseparate,
 wordsmerged FROM singlerow;

 hive select * from corrupted;

 OK

 world NULL
 of NULL
 hostels worldofhostels

 Time taken: 0.029 seconds, Fetched: 3 row(s)
 Apparently, the same happens - new table is split into multiple rows with
 columns following the one in question (like wordsmerged) become NULLs
 Am i doing something wrong here?

 Regards,
 Maciek





-- 
Kind Regards
Maciek Kocon