RE: Converting rows into dynamic colums in Hive

2012-08-08 Thread richin.jain
John,

What is R?

-Original Message-
From: ext John Meagher [mailto:john.meag...@gmail.com] 
Sent: Wednesday, August 08, 2012 4:34 PM
To: user@hive.apache.org
Subject: Re: Converting rows into dynamic colums in Hive

I don't think having dynamic columns is possible in Hive.  I've always output 
from Hive a structure like your query output and used R to convert it into a 
dynamic column structure.


On Wed, Aug 8, 2012 at 3:56 PM,   wrote:
> Thanks Ashish, that gives an idea.
>
> But I am not sure about the outer select loop, I have to know all the 
> values in Beta column beforehand to do a max on each value.
>
> Is there a better way?
>
>
>
> Richin
>
>
>
> From: ext Ashish Thusoo [mailto:athu...@qubole.com]
> Sent: Tuesday, August 07, 2012 5:05 PM
> To: user@hive.apache.org
> Subject: Re: Converting rows into dynamic colums in Hive
>
>
>
> you should be able to do this in hive using a group by on alpha and 
> then using a combination of the max and if statement... something on 
> the following lines
>
> select alpha, max(abc), max(pqr), ...
> (
>   select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta == 
> 'pqr', Gamma, NUL) as pqr, 
>   from table
> )
> group by alpha
>
> something on those lines...
>
> Ashish
>
> On Tue, Aug 7, 2012 at 1:57 PM,  wrote:
>
> Hi All,
>
>
>
> One of my Query output looks like-
>
>
>
> AlphaBeta   Gamma
>
> 123 xyz 1.0
>
> 123 abc 0.5
>
> 123 pqr 1.3
>
> 123 def 2.1
>
> 456 xyz 0.1
>
> 456 abc 0.6
>
> 456 pqr 1.9
>
> 456 def 3.2
>
> 456 lmn1.1
>
> 456 sdf  1.2
>
>
>
> I want the output for the data visualization purpose to look like 
> (basically taking rows from one table and making it column in another 
> table)
>
>
>
> Alphaxyz abc pqr def lmn
> sdf
>
> 123 1.0  0.5  1.3  2.1
>
> 456 0.1  0.6  1.9  3.2
> 1.1  1.2
>
>
>
> Question - Can it be done in Hive? If not, any suggestions.
>
>
>
> Thanks,
>
> Richin
>
>
>
>
>
>


RE: Custom UserDefinedFunction in Hive

2012-08-08 Thread Venkatesh Kavuluri
Actually as the custom UDF "yesterday()" mentioned below is NOT marked with the 
annotation @UDFType(deterministic = false), partition pruning should work in 
practice.  The PartitionPruner has a logic around this annotation to check if a 
generic UDF is deterministic or not and would skip partition pruning if it 
finds any non-deterministic function.   
http://svn.apache.org/repos/asf/hive/branches/branch-0.9/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
I guess this check is implemented to avoid inconsistencies in result set that 
might arise in scenarios like below where predicate is :part_col = f(time) and 
regular_col = f(time).
The expression involving "part_col" is evaluated at compile time and the 
expression involving "regular_col" is evaluated at run time and the function 
yesterday() might return different values if the query is executed around 
midnight.
Thanks,Venkatesh 

Date: Wed, 8 Aug 2012 03:49:56 -0700
From: bejoy...@yahoo.com
Subject: Re: Custom UserDefinedFunction in Hive
To: user@hive.apache.org

Hi Raihan
UDFs are evaluated at run time when the query is executed. But it is hive 
parser during query parse time decides the boundary of data to be used for the 
query, ie data from which all partitions has to be processed. Because of this 
the entire table will be scanned for your query.
 Regards,Bejoy KS
From: Raihan Jamal 
 To: user@hive.apache.org 
Cc: d...@hive.apache.org 
 Sent: Tuesday, August 7, 2012 10:50 PM
 Subject: Re: Custom UserDefinedFunction in Hive
   

Hi Jan,

 

I figured that out, it is working fine for me now. The only question I have is, 
if I am doing like this-

 

SELECT * FROM REALTIME where dt=
yesterdaydate('MMdd') LIMIT 10;

 

Then the above query will be evaluated as below right?

 

SELECT * FROM REALTIME where dt= ‘20120806’
LIMIT 10;

 

So that means it will look for data in the corresponding dt
partition (20120806) only right as above table is partitioned on dt column ?
And it will not scan the whole table right?


Raihan Jamal





On Mon, Aug 6, 2012 at 10:56 PM, Jan Dolinár  wrote:


Hi Jamal,
Check if the function really returns what it should and that your data are 
really in MMdd format. You can do this by simple query like this:
SELECT dt, yesterdaydate('MMdd') FROM REALTIME LIMIT 1;




I don't see anything wrong with the function itself, it works well for me 
(although I tested it in hive 0.7.1). The only thing I would change about it 
would be to optimize it by calling 'new' only at the time of construction and 
reusing the object when the function is called, but that should not affect the 
functionality at all.




Best regards,

Jan





On Tue, Aug 7, 2012 at 3:39 AM, Raihan Jamal  wrote:




Problem





I created the below UserDefinedFunction to get the yesterday's
day in the format I wanted as I will be passing the format into this below
method from the query.

 

public final class YesterdayDate extends UDF {

 

   
public String evaluate(final String format) { 

   
DateFormat dateFormat = new SimpleDateFormat(format); 

   
Calendar cal = Calendar.getInstance();

   
cal.add(Calendar.DATE, -1); 

   
return dateFormat.format(cal.getTime()).toString(); 

   
} 

}

 

 

So whenever I try to run the query like below by adding the jar
to classpath and creating the temporary function yesterdaydate, I always get
zero result back-

 

hive> create temporary function yesterdaydate as
'com.example.hive.udf.YesterdayDate';

OK

Time taken: 0.512 seconds

 

Below is the query I am running-

 

hive> SELECT * FROM REALTIME where dt=
yesterdaydate('MMdd') LIMIT 10;

OK

 

And I always get zero result back but the data is there in that
table for Aug 5th.

 

What wrong I am doing? Any suggestions will be appreciated.

 

 

NOTE:- As I am working with Hive 0.6 so it doesn’t support
variable substitution thing, so I cannot use hiveconf here and the above table
has been partitioned on dt(date) column.







  

Increase max tablename length?

2012-08-08 Thread Keith Wiley
The default seems to be 128.  Can it be increased?  I haven't found a 
configuration parameter for that yet.


Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"Luminous beings are we, not this crude matter."
   --  Yoda




Re: Converting rows into dynamic colums in Hive

2012-08-08 Thread John Meagher
I don't think having dynamic columns is possible in Hive.  I've always
output from Hive a structure like your query output and used R to
convert it into a dynamic column structure.


On Wed, Aug 8, 2012 at 3:56 PM,   wrote:
> Thanks Ashish, that gives an idea.
>
> But I am not sure about the outer select loop, I have to know all the values
> in Beta column beforehand to do a max on each value.
>
> Is there a better way?
>
>
>
> Richin
>
>
>
> From: ext Ashish Thusoo [mailto:athu...@qubole.com]
> Sent: Tuesday, August 07, 2012 5:05 PM
> To: user@hive.apache.org
> Subject: Re: Converting rows into dynamic colums in Hive
>
>
>
> you should be able to do this in hive using a group by on alpha and then
> using a combination of the max and if statement... something on the
> following lines
>
> select alpha, max(abc), max(pqr), ...
> (
>   select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta == 'pqr',
> Gamma, NUL) as pqr, 
>   from table
> )
> group by alpha
>
> something on those lines...
>
> Ashish
>
> On Tue, Aug 7, 2012 at 1:57 PM,  wrote:
>
> Hi All,
>
>
>
> One of my Query output looks like-
>
>
>
> AlphaBeta   Gamma
>
> 123 xyz 1.0
>
> 123 abc 0.5
>
> 123 pqr 1.3
>
> 123 def 2.1
>
> 456 xyz 0.1
>
> 456 abc 0.6
>
> 456 pqr 1.9
>
> 456 def 3.2
>
> 456 lmn1.1
>
> 456 sdf  1.2
>
>
>
> I want the output for the data visualization purpose to look like (basically
> taking rows from one table and making it column in another table)
>
>
>
> Alphaxyz abc pqr def lmn
> sdf
>
> 123 1.0  0.5  1.3  2.1
>
> 456 0.1  0.6  1.9  3.2
> 1.1  1.2
>
>
>
> Question – Can it be done in Hive? If not, any suggestions.
>
>
>
> Thanks,
>
> Richin
>
>
>
>
>
>


RE: Converting rows into dynamic colums in Hive

2012-08-08 Thread richin.jain
Thanks Ashish, that gives an idea.
But I am not sure about the outer select loop, I have to know all the values in 
Beta column beforehand to do a max on each value.
Is there a better way?

Richin

From: ext Ashish Thusoo [mailto:athu...@qubole.com]
Sent: Tuesday, August 07, 2012 5:05 PM
To: user@hive.apache.org
Subject: Re: Converting rows into dynamic colums in Hive

you should be able to do this in hive using a group by on alpha and then using 
a combination of the max and if statement... something on the following lines

select alpha, max(abc), max(pqr), ...
(
  select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta == 'pqr', 
Gamma, NUL) as pqr, 
  from table
)
group by alpha

something on those lines...

Ashish
On Tue, Aug 7, 2012 at 1:57 PM, 
mailto:richin.j...@nokia.com>> wrote:
Hi All,

One of my Query output looks like-

AlphaBeta   Gamma
123 xyz 1.0
123 abc 0.5
123 pqr 1.3
123 def 2.1
456 xyz 0.1
456 abc 0.6
456 pqr 1.9
456 def 3.2
456 lmn1.1
456 sdf  1.2

I want the output for the data visualization purpose to look like (basically 
taking rows from one table and making it column in another table)

Alphaxyz abc pqr def lmn
sdf
123 1.0  0.5  1.3  2.1
456 0.1  0.6  1.9  3.2  
1.1  1.2

Question - Can it be done in Hive? If not, any suggestions.

Thanks,
Richin





Status of HIVE-1434

2012-08-08 Thread Filippo Diotalevi
Hi, 
I'm trying to integrate Hive in our Cassandra-Hadoop stack, but I'm finding it 
difficult to understand what's the right approach.

As far as I see, there are 2 jira issues open (one in Hive, on in Cassandra) 
https://issues.apache.org/jira/browse/CASSANDRA-4131
https://issues.apache.org/jira/browse/HIVE-1434


but none of them is committed.

Is patching either Hive or Cassandra the only way to integrate them? Is it any 
plan to commit HIVE-1434 in Hive trunk? 

Thanks,
-- 
Filippo Diotalevi




Question about querying JSON data

2012-08-08 Thread Tom Brown
I have a large amount of data JSON data that was generated using
periods in the key names, e.g., {"category.field": "value"}.  I know
that's not the best way to do JSON but for better or worse, it's the
data I have to deal with.

I have tried using get_json_object, but I am concerned that it's JSON
path expressions interpret "." as a special character. I am also
concerned about the overhead of repeatedly parsing each record (each
record is about 2K, so not tiny, but not huge either).

I have tried using Hive-JSON-Serde but it seems to require that my
column names be named the same as my JSON field names.

I had heard that there was a serde somewhere that will allow me to
specify a JSON path to map to each specific field name, but other than
vague references on the mailing list, I haven't found any concrete
info about it.

I would to use existing code, but I can write my own serde if I have to.

What do you recommend?

Thanks in advance!

--Tom


Re: Hive and joins

2012-08-08 Thread Raghunath, Ranjith
Thanks Bejoy. What if those values a,b are not static and are housed in a 
table. Much like a type 2 dimension. Is the current solution still to put it in 
the where clause?

Thanks,
Ranjith

From: Bejoy Ks [mailto:bejoy...@yahoo.com]
Sent: Wednesday, August 08, 2012 06:55 AM
To: user@hive.apache.org 
Subject: Re: Hive and joins

Hi Ranjith

BETWEEN a and b, you can implement as >=a , <=b . Since that is not equality 
you cannot use that in ON clause you need to move it to WHERE condition in your 
query.

Regards,
Bejoy KS


From: "Raghunath, Ranjith" 
To: "'user@hive.apache.org'" 
Sent: Wednesday, August 8, 2012 8:39 AM
Subject: Hive and joins

Since Hive (0.7.1) only supports equi-based join how are people using it for 
joins employing the between clause?

Thanks,
Ranjith



Re: Hive and joins

2012-08-08 Thread Bejoy Ks
Hi Ranjith

BETWEEN a and b, you can implement as >=a , <=b . Since that is not equality 
you cannot use that in ON clause you need to move it to WHERE condition in your 
query.


 
Regards,
Bejoy KS



 From: "Raghunath, Ranjith" 
To: "'user@hive.apache.org'"  
Sent: Wednesday, August 8, 2012 8:39 AM
Subject: Hive and joins
 
Since Hive (0.7.1) only supports equi-based join how are people using it for 
joins employing the between clause?

Thanks,
Ranjith      

Re: Custom UserDefinedFunction in Hive

2012-08-08 Thread Bejoy Ks
Hi Raihan

UDFs are evaluated at run time when the query is executed. But it is hive 
parser during query parse time decides the boundary of data to be used for the 
query, ie data from which all partitions has to be processed.Because of this 
the entire table will be scanned for your query.


 
Regards,
Bejoy KS



 From: Raihan Jamal 
To: user@hive.apache.org 
Cc: d...@hive.apache.org 
Sent: Tuesday, August 7, 2012 10:50 PM
Subject: Re: Custom UserDefinedFunction in Hive
 

Hi Jan,
 I figured that out, it is working fine for me now. The only question I have 
is, if I am doing like this- 
 
SELECT * FROM REALTIME where dt=
yesterdaydate('MMdd') LIMIT 10;
 
Then the above query will be evaluated as below right?
 
SELECT * FROM REALTIME where dt= ‘20120806’
LIMIT 10;
 
So that means it will look for data in the corresponding dt
partition (20120806) only right as above table is partitioned on dt column ?
And it will not scan the whole table right?



Raihan Jamal



On Mon, Aug 6, 2012 at 10:56 PM, Jan Dolinár  wrote:

Hi Jamal,
>
>
>Check if the function really returns what it should and that your data are 
>really in MMdd format. You can do this by simple query like this:
>
>
>SELECT dt, yesterdaydate('MMdd') FROM REALTIME LIMIT 1;
>
>
>I don't see anything wrong with the function itself, it works well for me 
>(although I tested it in hive 0.7.1). The only thing I would change about it 
>would be to optimize it by calling 'new' only at the time of construction and 
>reusing the object when the function is called, but that should not affect the 
>functionality at all.
>
>
>Best regards,
>Jan
>
>
>
>
>
>
>
>On Tue, Aug 7, 2012 at 3:39 AM, Raihan Jamal  wrote:
>
>Problem
>>I created the below UserDefinedFunction to get the yesterday's
day in the format I wanted as I will be passing the format into this below
method from the query.
>> 
>>public final class YesterdayDate extends UDF {
>> 
>>   
public String evaluate(final String format) { 
>>   
DateFormat dateFormat = new SimpleDateFormat(format); 
>>   
Calendar cal = Calendar.getInstance();
>>   
cal.add(Calendar.DATE, -1); 
>>   
return dateFormat.format(cal.getTime()).toString(); 
>>   
} 
>>}
>> 
>> 
>>So whenever I try to run the query like below by adding the jar
to classpath and creating the temporary function yesterdaydate, I always get
zero result back-
>> 
>>hive> create temporary function yesterdaydate as
'com.example.hive.udf.YesterdayDate';
>>OK
>>Time taken: 0.512 seconds
>> 
>>Below is the query I am running-
>> 
>>hive> SELECT * FROM REALTIME where dt=
yesterdaydate('MMdd') LIMIT 10;
>>OK
>> 
>>And I always get zero result back but the data is there in that
table for Aug 5th.
>> 
>>What wrong I am doing? Any suggestions will be appreciated.
>> 
>> 
>>NOTE:- As I am working with Hive 0.6 so it doesn’t support
variable substitution thing, so I cannot use hiveconf here and the above table
has been partitioned on dt(date) column.
>

RE: Special character replaced by '?'

2012-08-08 Thread Balaraman, Anand
Any comments/information on the below issue is highly appreciable J

 

From: Balaraman, Anand [mailto:anand_balara...@syntelinc.com] 
Sent: 06 August 2012 18:25
To: user@hive.apache.org
Subject: Special character replaced by '?'

 

Hi

 

I am facing an issue while viewing special characters (such as é) using Hive.

If I view the file in HDFS (using hadoop fs -cat command), it is displayed 
correctly as 'é', but when I select the data using Hive, this character alone 
gets replaced by a question mark.

 

Do we have any solution for handling this?

 

Regards

Anand B

 


Confidential: This electronic message and all contents contain information from 
Syntel, Inc. which may be privileged, confidential or otherwise protected from 
disclosure. The information is intended to be for the addressee only. If you 
are not the addressee, any disclosure, copy, distribution or use of the 
contents of this message is prohibited. If you have received this electronic 
message in error, please notify the sender immediately and destroy the original 
message and all copies.