Re: Handling hyphens in table/database/usernames

2011-10-31 Thread Sriram Krishnan
Thanks Jander.

Not supporting "-" in database/table names is fine,  we can work around that. 
What about usernames? Like I said, our main problem is that we have a user with 
a hyphenated username – and I would like to figure out if there is a "clean" 
way to add him to a certain role, or assign him specific privileges using a 
grant statement (see the 2nd example).

I have figured out a hacky way to do this, which is to update the metastore 
directly – the DB_PRIVS and ROLE_MAPS tables can be updated directly for this 
particular user. But I was hoping to use the Hive command-line, and not update 
the metastore directly.

Cheers,
Sriram

ps: Not a huge deal, but is there a reason why Hive won't support hyphens in 
database/table names?

From: Jander g mailto:jande...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Mon, 31 Oct 2011 23:05:13 -0700
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Re: Handling hyphens in table/database/usernames

Hive does not support '-' as a database name, which pattern is "[\\w_]+", so 
you can use '_' instead

On Tue, Nov 1, 2011 at 9:29 AM, Sriram Krishnan 
mailto:skrish...@netflix.com>> wrote:
Hi,

Does anyone know how to handle hyphens in any of the following? Specifically, 
we have a user with a hyphenated username – and I can't figure out a way to add 
him to a certain role. No amount of escaping my hyphen seems to help. I am 
using Hive version 0.7.1.

Any ideas would be appreciated.

Thanks,
Sriram

hive> create database if not exists sri-krish;
FAILED: Parse Error: line 1:33 mismatched input '-' expecting EOF near 'sri'

hive> grant all on database default to user sri-krish;
FAILED: Parse Error: line 1:41 mismatched input '-' expecting EOF near 'sri'




--
Thanks,
Jander



Re: Handling hyphens in table/database/usernames

2011-10-31 Thread Jander g
Hive does not support '-' as a database name, which pattern is "[\\w_]+",
so you can use '_' instead

On Tue, Nov 1, 2011 at 9:29 AM, Sriram Krishnan wrote:

> Hi,
>
> Does anyone know how to handle hyphens in any of the following?
> Specifically, we have a user with a hyphenated username – and I can't
> figure out a way to add him to a certain role. No amount of escaping my
> hyphen seems to help. I am using Hive version 0.7.1.
>
> Any ideas would be appreciated.
>
> Thanks,
> Sriram
>
> hive> create database if not exists sri-krish;
> FAILED: Parse Error: line 1:33 mismatched input '-' expecting EOF near
> 'sri'
>
> hive> grant all on database default to user sri-krish;
> FAILED: Parse Error: line 1:41 mismatched input '-' expecting EOF near
> 'sri'
>
>


-- 
Thanks,
Jander


Handling hyphens in table/database/usernames

2011-10-31 Thread Sriram Krishnan
Hi,

Does anyone know how to handle hyphens in any of the following? Specifically, 
we have a user with a hyphenated username – and I can't figure out a way to add 
him to a certain role. No amount of escaping my hyphen seems to help. I am 
using Hive version 0.7.1.

Any ideas would be appreciated.

Thanks,
Sriram

hive> create database if not exists sri-krish;
FAILED: Parse Error: line 1:33 mismatched input '-' expecting EOF near 'sri'

hive> grant all on database default to user sri-krish;
FAILED: Parse Error: line 1:41 mismatched input '-' expecting EOF near 'sri'



What is best way to load data into hive tables/hadoop file system

2011-10-31 Thread Shantian Purkad
Hello,

We have multiple terabytes of data (currently in gz format approx size 2GB per 
file). What is best way to load that data into Hadoop?

We have seen that (especially when loaded using hive's load data local inpath 
) to load a gz file it takes around 12 seconds and when we decompress it 
(around 4~5GB) it takes 8 minutes to load the file.

We want these files to be processed using multiple mappers on the Hadoop and 
not with singles.

What would be best way to load these files in Hive/hdfs so that it takes less 
time to load as well as use multiple mappers to process the files.


Thanks and Regards,
Shantian


Re: testing out Phabricator for code review

2011-10-31 Thread John Sichi
Marek added support for svn, so that is working now too...give it a try!  
Instructions updated at

https://cwiki.apache.org/confluence/display/Hive/PhabricatorCodeReview

JVS

On Oct 26, 2011, at 10:49 PM,  wrote:

> I've put up instructions for how anyone can start using Phabricator for code 
> review:
> 
> https://cwiki.apache.org/confluence/display/Hive/PhabricatorCodeReview
> 
> We've tested out the git workflows; still working on svn.
> 
> Feedback on how it works for you, anything you noticed missing, etc is 
> appreciated.
> 
> JVS
> 
> On Oct 20, 2011, at 2:00 PM,  wrote:
> 
>> Hey all,
>> 
>> Earlier this year, Facebook released a bunch of its code browsing/review 
>> tools as a new (and independent) open source project called Phabricator:
>> 
>> http://phabricator.org/
>> 
>> We're currently experimenting with using it for improving the developer 
>> experience when contributing and reviewing Hive and HBase patches.  (Also 
>> for eliminating committer confusion from different patch versions submitted 
>> to Review Board and JIRA, something which has bitten us a few times already.)
>> 
>> You may notice some of this activity showing up in JIRA, e.g.
>> 
>> https://issues.apache.org/jira/browse/HIVE-2515
>> 
>> I'll be sending out a lot more info once we've finished some of the setup 
>> and validation, but I just wanted to send out a heads-up for those who are 
>> already familiar with using the existing Review Board setup.  Once 
>> validation is done, we'll publish instructions so that everyone can test it 
>> out for themselves as a potential alternative to Review Board.
>> 
>> JVS
>> 
> 



pass entire row as parameter in hive UDF

2011-10-31 Thread Chen Song
Hi All

In HIVE, I would like to write a UDF that accepts a sequence of parameters. Due 
to that the number of parameters is large and the particular function that I am 
writing is specific to a set of tables (joined in some way in the SQL), I am 
wondering if there is a way to pass the entire row as a wildcard parameter and 
then queried for its fields in UDF within this context, as shown in the below 
example.


select my_function(*) as my_column from t1, t2, etc where [a set of join 
conditions].

I did some investigation and found there was a JIRA opened for this.

https://issues.apache.org/jira/browse/HIVE-1459

This ticket is opened as a follow up to ticket HIVE-287 to support star 
expansion in general and seems still open. If anyone knows a way to pass the 
entire row as a context in UDF, that would be very helpful.


Regards,
Chen


Hadoop Counters

2011-10-31 Thread Jason Rutherglen
How can one integrate Hadoop counters into Hive?  Thanks!


Does hive support running on an existing NFS

2011-10-31 Thread Bing Li
When I distribute Hive to a NFS and execute a select command, it failed:
hive> SELECT a.foo FROM invites a;Total MapReduce jobs = 1Launching Job 1 out 
of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting 
Job = job_201110310722_0001, Tracking URL = 
http://localhost:50030/jobdetails.jsp?jobid=job_201110310722_0001Kill Command = 
/home/libing/hadoop/bin/../bin/hadoop job  -Dmapred.job.trac                    
                                                                        
ker=localhost:9001 -kill job_201110310722_00012011-10-31 07:30:57,717 Stage-1 
map = 100%,  reduce = 100%Ended Job = job_201110310722_0001 with errorsFAILED: 
Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRe

Then I took a look at the log of hadoop:2011-10-31 07:30:54,390 INFO 
org.apache.hadoop.mapred.TaskInProgress: Error from 
attempt_201110310722_0001_m_02_3: java.io.FileNotFoundException: File 
/tmp/hive-biadmin/hive_2011-10-31_07-30-05_610_9164994782186337826/-mr-10003/990312dc-3241-4cc7-b9f6-018beb1739bb
 does not exist.        at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) 
       at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)  
      at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:163)

I'm not sure if it's a defect in Hive or in Hadoop. The hadoop version I used 
is 0.20.2

High number of input files problems

2011-10-31 Thread Florin Diaconeasa
Hello,

Lately our user base has increased so the input files have increased
considerably in size and number.

One of our processing steps is doing a query of the form found at the end
of the email. My problem is that apparently, sometimes, the processing
misses some of the input files (for the 2nd select in most cases).

I'm using Hive 0.6, Hadoop 0.20.2 on a Debian 5 64bit and we are connecting
to a hive server instance using JDBC. Any idea on what parameters i could
tune of any tickets that have been opened on this problem? I searched the
Hive JIRA for nothing until now... The only thing that i think might be
related is https://issues.apache.org/jira/browse/HIVE-1884

SELECT
t.a,
sum(t.b),
sum(t.c),
sum(t.d)
FROM
(
SELECT
a,
sum(x) as b,
sum(y) as c,
sum(z) as d
FROM T1
WHERE ...
GROUP BY ...
UNION ALL

SELECT
a,
sum(x) as b,
sum(y) as c,
sum(z) as d
FROM T2
WHERE ...
GROUP BY ...
UNION ALL

SELECT
a,
sum(x) as b,
sum(y) as c,
sum(z) as d
FROM T3
WHERE ...
GROUP BY ...
) t

GROUP BY ...



-- 


Florin