Issues in retrieval of hive data-reg

2014-02-04 Thread Selvi. rceg
I am doing my project on big data. I had installed *HADOOP 1.2.1* and *HIVE
- 0.11.0* in *UBUNTU 11.10* . I have created tables in hive and structured
the contents in the table using hive. Now I need to retrieve the structured
data. Can you please help me to retrieve the structured data? On trying to
find the location of the data, I found out it to be
*hdfs://localhost:54310/user/hive/warehouse/table_name.
*But I don't know where to find this location.

HADOOP- /usr/local/hadoop
HIVE - /usr/local/hadoop/hive 0.11.0


Re: Optimising mappers for number of nodes

2014-02-04 Thread Lefty Leverenz
Actually that's mapred.max.split.size.  Hive doesn't have a configuration
parameter named hive.max.split.size.

-- Lefty


On Mon, Feb 3, 2014 at 10:59 AM, Prasanth Jayachandran 
pjayachand...@hortonworks.com wrote:

 Hi

 hive.max.split.size can be tuned to decrease the number of mappers.
 Reference: http://www.slideshare.net/ye.mikez/hive-tuning (slide number
 38)

 Also using CombineHiveInputFormat (default input format) will combine
 multiple small files to form a large split and hence less number of mappers.

 Thanks
 Prasanth Jayachandran

 On Feb 3, 2014, at 10:20 AM, KingDavies kingdav...@gmail.com wrote:

 Our platform has a 40GB raw data file that was compressed lzo (12GB
 compressed) to reduce network IO between S3.
 Without indexing the file is unsplittable resulting in 1 map task and poor
 cluster utilisation.
 After indexing the file to be splitable the hive query produces 120 map
 tasks.
 However, with the 120 tasks distributed over a small 4 node cluster it
 takes longer to process the data than when it wasn't splitable and
 processing done by a single node (1h20mins vs 17mins). This was with a
 fairly simple select from where query, without distinct, group by or order.
 I'd like to utilise all nodes in the cluster, to reduce query time. Whats
 the best way to have the data crunched in parallel but with fewer mappers?



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


collect_list on two columns of the same row

2014-02-04 Thread ZORAIDA HIDALGO SANCHEZ
Dear all,

I need to call to one of my UDFs that receives two arrays as a parameters. Each 
of these arrays represents the result of collect_list(col) after doing group 
by. Does the first position of the first array correspond with the first 
position to the second array?

Por instace, having this:

col11 col21 col31
col12 col22 col32
col13 col23 col33

i would expect :
select col3,
result
from (
 select col3, collect_list(col1) as col1_list,collect_list(col2) as 
col2_list
 from my_table
 group by col3 ) tmp
lateral view my_udf([col11,col12,col13],[col21,col22,col23])tmp as result

is that correct?

Thanks.

Zoraida.-



Este mensaje se dirige exclusivamente a su destinatario. Puede consultar 
nuestra política de envío y recepción de correo electrónico en el enlace 
situado más abajo.
This message is intended exclusively for its addressee. We only send and 
receive email on the basis of the terms set out at:
http://www.tid.es/ES/PAGINAS/disclaimer.aspx


Re: Index not getting used for the queries

2014-02-04 Thread Thilina Gunarathne
Thanks Peter. It helped. That property combined with setting the property
'hive.optimize.index.filter' to 'true' got the index working.

thanks,
Thilina


On Mon, Feb 3, 2014 at 6:12 PM, Peter Marron 
peter.mar...@trilliumsoftware.com wrote:

  Hi,



 Not sure if it is relevant to your problem but I'm just checking

 that you know about

 hive.optimize.index.filter.compact.minsize

 it's set to 5Gbytes by default and if the estimated query size is

 less than this then the index won't be used.

 HTH.



 Regards



 *Peter Marron*

 Senior Developer, Research  Development



 Office: +44 *(0) 118-940-7609*  peter.mar...@trilliumsoftware.com

 Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK

https://www.facebook.com/pages/Trillium-Software/109184815778307

  https://twitter.com/TrilliumSW

  http://www.linkedin.com/company/17710



 *www.trilliumsoftware.com http://www.trilliumsoftware.com/*

 Be Certain About Your Data. Be Trillium Certain.



 *From:* Thilina Gunarathne [mailto:cset...@gmail.com]
 *Sent:* 03 February 2014 16:08
 *To:* user
 *Subject:* Index not getting used for the queries



 Dear all,

 I created a compact index for a table with several hundred million records
 as follows. The table is partitioned by the month. The index on A and B was
 created successfully, but I can't see it getting used in the queries. It
 would be great if one of you experts can shed some light on what  am I
 missing. I'm using hive 0.9.

 set hive.exec.parallel=false;
 CREATE INDEX idx_
 ON TABLE (a,b)
 AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
 WITH DEFERRED REBUILD
 COMMENT 'Index for  table. Indexing on A and B';
 ALTER INDEX idx_ on  REBUILD;


 hive describe ;
 OK
 abigint
 ...
 bbigint
 

 month int



 hive show index on ;
 OK
 idx_    a, b
 default___p_idx___compact Index for tm top50
 table. Indexing on A and B


 hive explain select a,b from tm_top50_p where a=113231 and
 month=201308;
 OK
 ABSTRACT SYNTAX TREE:
   (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME ))) (TOK_INSERT
 (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR
 (TOK_TABLE_OR_COL a)) (TOK_SELEXPR (TOK_TABLE_OR_COL b))) (TOK_WHERE (and
 (= (TOK_TABLE_OR_COL a) 113231) (= (TOK_TABLE_OR_COL month) 201308)

 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage

 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 
   TableScan
 alias: 
 Filter Operator
   predicate:
   expr: (a = 113231)
   type: boolean
   Select Operator
 expressions:
   expr: a
   type: bigint
   expr: b
   type: bigint
 outputColumnNames: _col0, _col1
 File Output Operator
   compressed: false
   GlobalTableId: 0
   table:
   input format:
 org.apache.hadoop.mapred.TextInputFormat
   output format:
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

   Stage: Stage-0
 Fetch Operator
   limit: -1

   thanks a lot,
 Thilina


 --
 https://www.cs.indiana.edu/~tgunarat/
 http://www.linkedin.com/in/thilina

 http://thilina.gunarathne.org




-- 
https://www.cs.indiana.edu/~tgunarat/
http://www.linkedin.com/in/thilina
http://thilina.gunarathne.org
image001.pngimage002.pngimage004.pngimage003.png

Hive on Windows w/Hadoop 2.2.0

2014-02-04 Thread Ian Jackson
Reading the Wiki (cwiki.apache.org/confluence/display/Hive/GettingStarted), it 
states It does not work on Cygwin. Do further details exist or hints to make 
work on Windows?


RE: Hive on Windows w/Hadoop 2.2.0

2014-02-04 Thread Eric Hanson (BIG DATA)
HDP runs on Windows:

http://hortonworks.com/products/hdp-windows/#install

I don't know if it uses Cygwin or not but everything needed to run Hive on 
Windows should be in the install package.

Eric

From: Ian Jackson [mailto:ian_jack...@trilliumsoftware.com]
Sent: Tuesday, February 4, 2014 8:55 AM
To: user@hive.apache.org
Subject: Hive on Windows w/Hadoop 2.2.0

Reading the Wiki (cwiki.apache.org/confluence/display/Hive/GettingStarted), it 
states It does not work on Cygwin. Do further details exist or hints to make 
work on Windows?


Re: GenericUDF Testing in Hive

2014-02-04 Thread Raj Hadoop
How to test a Hive GenericUDF which accepts two parameters ListT, T 

ListT - Can it be the output of a collect set. Please advise.

I have a generic udf which takes ListT, T. I want to test it how it works 
through Hive. 





On Monday, January 20, 2014 5:19 PM, Raj Hadoop hadoop...@yahoo.com wrote:
 
 
The following is a an example for a GenericUDF. I wanted to test this through a 
Hive query. Basically want to pass parameters some thing like select 
ComplexUDFExample('a','b','c') from employees limit 10.


 
 
https://github.com/rathboma/hive-extension-examples/blob/master/src/main/java/com/matthewrathbone/example/ComplexUDFExample.java
 
 
 
class ComplexUDFExample extends GenericUDF {
  ListObjectInspector listOI;
  StringObjectInspector elementOI;
  @Override
  public String getDisplayString(String[] arg0) {
    return arrayContainsExample(); // this should probably be better
  }
  @Override
  public ObjectInspector initialize(ObjectInspector[] arguments) throws 
UDFArgumentException {
    if (arguments.length != 2) {
  throw new UDFArgumentLengthException(arrayContainsExample only takes 2 
arguments: ListT, T);
    }
    // 1. Check we received the right object types.
    ObjectInspector a = arguments[0];
    ObjectInspector b = arguments[1];
    if (!(a instanceof ListObjectInspector) || !(b instanceof 
StringObjectInspector)) {
  throw new UDFArgumentException(first argument must be a list / array, 
second argument must be a
 string);
    }
    this.listOI = (ListObjectInspector) a;
    this.elementOI = (StringObjectInspector) b;
    
    // 2. Check that the list contains strings
    if(!(listOI.getListElementObjectInspector() instanceof 
StringObjectInspector)) {
  throw new UDFArgumentException(first argument must be a list of 
strings);
    }
    
    // the return type of our function is a boolean, so we provide the correct 
object inspector
    return PrimitiveObjectInspectorFactory.javaBooleanObjectInspector;
  }
  
  @Override
  public Object evaluate(DeferredObject[] arguments) throws HiveException {
    
    // get the list and string from the deferred objects using the object
 inspectors
    ListString list = (ListString) this.listOI.getList(arguments[0].get());
    String arg = elementOI.getPrimitiveJavaObject(arguments[1].get());
    
    // check for nulls
    if (list == null || arg == null) {
  return null;
    }
    
    // see if our list contains the value we need
    for(String s: list) {
  if (arg.equals(s)) return new Boolean(true);
    }
    return new Boolean(false);
  }
  
}
 
 
hive select ComplexUDFExample('a','b','c') from email_list_1 limit 10;
FAILED: SemanticException [Error 10015]: Line 1:7 Arguments length mismatch 
''c'': arrayContainsExample only takes 2 arguments: ListT, T
 
--
 
How to test this example in Hive query. I know I am invoking it wrong. But how 
can I invoke it correctly.
 
My requirement is to pass a String of arrays as first argument and another 
string as second argument in Hive like below.
 
 
Select col1, ComplexUDFExample( collectset(col2) , 'xyz')
from 
Employees
Group By col1;
 
How do i do that?
 
Thanks in advance.
 
Regards,
Raj

Re: GenericUDF Testing in Hive

2014-02-04 Thread Raj Hadoop

I want to do a simple test like this - but not working -

select ComplexUDFExample(List(a, b, c), b) from table1 limit 10;


FAILED: SemanticException [Error 10011]: Line 1:25 Invalid function 'List'






On Tuesday, February 4, 2014 2:34 PM, Raj Hadoop hadoop...@yahoo.com wrote:
 
How to test a Hive GenericUDF which accepts two parameters ListT, T 

ListT - Can it be the output of a collect set. Please advise.

I have a generic udf which takes ListT, T. I want to test it how it works 
through Hive. 





On Monday, January 20, 2014 5:19 PM, Raj Hadoop hadoop...@yahoo.com wrote:
 
 
The following is a an example for a GenericUDF. I wanted to test this through a 
Hive query. Basically want to pass parameters some thing like select 
ComplexUDFExample('a','b','c') from employees limit 10.


 
 
https://github.com/rathboma/hive-extension-examples/blob/master/src/main/java/com/matthewrathbone/example/ComplexUDFExample.java
 
 
 
class ComplexUDFExample extends GenericUDF {
  ListObjectInspector listOI;
  StringObjectInspector elementOI;
  @Override
  public String getDisplayString(String[] arg0) {
    return arrayContainsExample(); // this should probably be better
  }
  @Override
  public ObjectInspector initialize(ObjectInspector[] arguments) throws 
UDFArgumentException {
    if (arguments.length != 2) {
  throw new UDFArgumentLengthException(arrayContainsExample only takes 2 
arguments: ListT, T);
    }
    // 1. Check we received the right object types.
    ObjectInspector a = arguments[0];
    ObjectInspector b = arguments[1];
    if (!(a instanceof ListObjectInspector) || !(b instanceof 
StringObjectInspector)) {
  throw new UDFArgumentException(first argument must be a list / array, 
second argument must be a
 string);
    }
    this.listOI = (ListObjectInspector) a;
    this.elementOI = (StringObjectInspector) b;
    
    // 2. Check that the list contains strings
    if(!(listOI.getListElementObjectInspector() instanceof 
StringObjectInspector)) {
  throw new UDFArgumentException(first argument must be a list of 
strings);
    }
    
    // the return type of our function is a boolean, so we provide the correct 
object inspector
    return PrimitiveObjectInspectorFactory.javaBooleanObjectInspector;
  }
  
  @Override
  public
 Object evaluate(DeferredObject[] arguments) throws HiveException {
    
    // get the list and string from the deferred objects using the object
 inspectors
    ListString list = (ListString) this.listOI.getList(arguments[0].get());
    String arg = elementOI.getPrimitiveJavaObject(arguments[1].get());
    
    // check for nulls
    if (list == null || arg == null) {
  return null;
    }
    
    // see if our list contains the value we need
    for(String s: list) {
  if (arg.equals(s)) return new Boolean(true);
    }
    return new Boolean(false);
  }
  
}
 
 
hive select ComplexUDFExample('a','b','c') from email_list_1 limit 10;
FAILED: SemanticException [Error 10015]: Line 1:7 Arguments length mismatch 
''c'': arrayContainsExample only takes 2 arguments: ListT, T
 
--
 
How to test this example in Hive query. I know I am invoking it wrong. But how 
can I invoke it correctly.
 
My requirement is to pass a String of arrays as first argument and another 
string as second argument in Hive like below.
 
 
Select col1, ComplexUDFExample( collectset(col2) , 'xyz')
from 
Employees
Group By col1;
 
How do i do that?
 
Thanks in advance.
 
Regards,
Raj

Hive queries for disk usage analysis

2014-02-04 Thread Mungre,Surbhi
Hello All,

We are doing some analysis for which we need to determine things like size of 
the largest row or size of the largest column. By size, I am referring to disk 
space usage. Does HIVE provide any functions to run such queries?

Thanks,
Surbhi Mungre
Software Engineer
www.cerner.comhttp://www.cerner.com/

CONFIDENTIALITY NOTICE This message and any included attachments are from 
Cerner Corporation and are intended only for the addressee. The information 
contained in this message is confidential and may constitute inside or 
non-public information under international, federal, or state securities laws. 
Unauthorized forwarding, printing, copying, distribution, or use of such 
information is strictly prohibited and may be unlawful. If you are not the 
addressee, please promptly delete this message and notify the sender of the 
delivery error by e-mail or you may call Cerner's corporate offices in Kansas 
City, Missouri, U.S.A at (+1) (816)221-1024.