Re: Exceptions while running Hive

2013-11-10 Thread Sonal Goyal
Just looking at the stacktrace - I think the protostuff jars are being
referenced by your custom hive serde, so I doubt it could have worked
earlier.

Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

 <http://in.linkedin.com/in/sonalgoyal>




On Sat, Nov 9, 2013 at 1:39 AM, Narla,Venkatesh
wrote:

>  Thanks Sonal
>
>  I Checked the class path of hive and I do not find the jar. But is this
> a compulsory jar that needs to be in hive class path. I did not face the
> problem earlier even without this jar.Could you help me with some
> information on it. Thank you for your time.
>
>
>  Regards
> VN
>  From: Sonal Goyal 
> Reply-To: "user@hive.apache.org" 
> Date: Thursday, November 7, 2013 10:50 AM
> To: "user@hive.apache.org" 
> Subject: Re: Exceptions while running Hive
>
>   Does your hive classpath contain the jar having
> com.dyuproject.protostuff.Schema ?
>
> Best Regards,
> Sonal
> Nube 
> Technologies<https://urldefense.proofpoint.com/v1/url?u=http://www.nubetech.co&k=PmKqfXspAHNo6iYJ48Q45A%3D%3D%0A&r=aqNUF8tRpvxC7s%2FASnqJIMx2lZ43gnNyUPac3kIkf6c%3D%0A&m=eBrey%2BlmsztwYexIY2eG%2FBeqKKMqTBat0wA2VkQCB9A%3D%0A&s=d392eafb03f5c57529e8d3c1a945516e4a071e551c5f785c8ffb835fbdb5bca5>
>
>
> <https://urldefense.proofpoint.com/v1/url?u=http://in.linkedin.com/in/sonalgoyal&k=PmKqfXspAHNo6iYJ48Q45A%3D%3D%0A&r=aqNUF8tRpvxC7s%2FASnqJIMx2lZ43gnNyUPac3kIkf6c%3D%0A&m=eBrey%2BlmsztwYexIY2eG%2FBeqKKMqTBat0wA2VkQCB9A%3D%0A&s=63fcbbf17fa4cca8dcc808692d4dc97866ab523b00c566ceddabde9d8484e2ab>
>
>
>
>
> On Thu, Nov 7, 2013 at 9:16 PM, Narla,Venkatesh <
> venkatesh.na...@cerner.com> wrote:
>
>>  Hello,
>>
>>
>>  I am getting the following exception when I try to run a query. Can any
>> body help me what the problem might be in this scenario.
>>
>>  Thanks for you time.
>>
>>
>>  2013-11-07 09:23:38,498 FATAL ExecMapper: java.lang.NoClassDefFoundError: 
>> com/dyuproject/protostuff/Schema
>>  at com.cerner.kepler.directory.Directory.initialize(Directory.java:240)
>>  at com.cerner.kepler.directory.Directory.getInstance(Directory.java:227)
>>  at 
>> com.cerner.kepler.entity.hbase.EntityTypeStore.remoteLoadTypeMap(EntityTypeStore.java:256)
>>  at 
>> com.cerner.kepler.entity.hbase.EntityTypeStore.loadTypeMap(EntityTypeStore.java:152)
>>  at 
>> com.cerner.kepler.entity.hbase.EntityTypeStore.getTypeFromId(EntityTypeStore.java:637)
>>  at 
>> com.cerner.kepler.entity.hbase.EntityTypeStore.toEntityKey(EntityTypeStore.java:584)
>>  at 
>> com.cerner.kepler.entity.hbase.HBaseKeyEncoder.toKey(HBaseKeyEncoder.java:48)
>>  at 
>> com.cerner.kepler.hive.KeplerCompositeKey.init(KeplerCompositeKey.java:99)
>>  at 
>> org.apache.hadoop.hive.hbase.RawExtensionLazyRow.uncheckedGetField(RawExtensionLazyRow.java:232)
>>  at 
>> org.apache.hadoop.hive.hbase.RawExtensionLazyRow.getField(RawExtensionLazyRow.java:163)
>>  at 
>> org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:229)
>>  at 
>> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.evaluate(ExprNodeColumnEvaluator.java:98)
>>  at 
>> org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.evaluate(ExprNodeFieldEvaluator.java:80)
>>  at 
>> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:64)
>>  at 
>> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqual.evaluate(GenericUDFOPEqual.java:38)
>>  at 
>> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:163)
>>  at 
>> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:64)
>>  at 
>> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd.evaluate(GenericUDFOPAnd.java:52)
>>  at 
>> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:163)
>>  at 
>> org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:118)
>>  at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
>>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
>>  at 
>> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
>>  at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
>>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.

Re: Exceptions while running Hive

2013-11-07 Thread Sonal Goyal
Does your hive classpath contain the jar having
com.dyuproject.protostuff.Schema
?

Best Regards,
Sonal
Nube Technologies 






On Thu, Nov 7, 2013 at 9:16 PM, Narla,Venkatesh
wrote:

>  Hello,
>
>
>  I am getting the following exception when I try to run a query. Can any
> body help me what the problem might be in this scenario.
>
>  Thanks for you time.
>
>
>  2013-11-07 09:23:38,498 FATAL ExecMapper: java.lang.NoClassDefFoundError: 
> com/dyuproject/protostuff/Schema
>   at com.cerner.kepler.directory.Directory.initialize(Directory.java:240)
>   at com.cerner.kepler.directory.Directory.getInstance(Directory.java:227)
>   at 
> com.cerner.kepler.entity.hbase.EntityTypeStore.remoteLoadTypeMap(EntityTypeStore.java:256)
>   at 
> com.cerner.kepler.entity.hbase.EntityTypeStore.loadTypeMap(EntityTypeStore.java:152)
>   at 
> com.cerner.kepler.entity.hbase.EntityTypeStore.getTypeFromId(EntityTypeStore.java:637)
>   at 
> com.cerner.kepler.entity.hbase.EntityTypeStore.toEntityKey(EntityTypeStore.java:584)
>   at 
> com.cerner.kepler.entity.hbase.HBaseKeyEncoder.toKey(HBaseKeyEncoder.java:48)
>   at 
> com.cerner.kepler.hive.KeplerCompositeKey.init(KeplerCompositeKey.java:99)
>   at 
> org.apache.hadoop.hive.hbase.RawExtensionLazyRow.uncheckedGetField(RawExtensionLazyRow.java:232)
>   at 
> org.apache.hadoop.hive.hbase.RawExtensionLazyRow.getField(RawExtensionLazyRow.java:163)
>   at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:229)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.evaluate(ExprNodeColumnEvaluator.java:98)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.evaluate(ExprNodeFieldEvaluator.java:80)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:64)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqual.evaluate(GenericUDFOPEqual.java:38)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:163)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:64)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd.evaluate(GenericUDFOPAnd.java:52)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:163)
>   at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:118)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:546)
>   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:393)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>   at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: java.lang.ClassNotFoundException: com.dyuproject.protostuff.Schema
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>   ... 35 more
>
>  CONFIDENTIALITY NOTICE This message and any included attachments are
> from Cerner Corporation and are intended only for the addressee. The
> information contained in this message is confidential and may constitute
> inside or non-public information under international, federal, or state
> securities laws. Unauthorized forwarding, printing, copying, distribution,
> or use of such information is strictly prohibited and may be unlawful. If
> you are not the addressee, please promptly delete this message and notify
> the sender of the delivery error by e-mail or you may call Cerner's
> corporate offices in Kansas City, Missouri, U.S.A at (+1) 

Re: Where to get hive serde jars.

2013-10-16 Thread Sonal Goyal
You can get the ser de from https://github.com/cloudera/cdh-twitter-example. I 
am not sure if there is a prebuilt version in any Cloudera repo, but you can 
check with the Cloudera team. 



Sent from my iPad

On Oct 16, 2013, at 7:12 PM, Panshul Whisper  wrote:

> Hello,
> 
> I am trying to implement Serde in hive for reading Json files directly into 
> my hive tables.
> I am using cloudera hue for querying to the hive server. 
> 
> Where can I get the cloudera hive serde jars from? 
> or am I missing something else?
> 
> when I create a table with the following statement: 
> 
> create external table if not exists power_raw_json (
>   value STRING
>   )
>   ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
>   Location 's3n://verivox-rawdata/strom/version1/2013_10_1/';
> 
> I get the following error:
> 
> FAILED: Error in metadata: Cannot validate serde: 
> com.cloudera.hive.serde.JSONSerDe
> 
> My guess, I need to add the serde jar to the path of Hive server, but where 
> do I find the Jars?
> 
> Thanking you for the help.
> -- 
> Regards,
> Ouch Whisper
> 010101010101


Re: Hive to HDFS directly using INSERT OVERWRITE DIRECTORY Imcompatible issue

2013-10-14 Thread Sonal Goyal
You could create an external table at a location of your choice with the
format desired. Then do a select into the table. The data at the location
of your table will be in the format you desire, which you can copy over.

Best Regards,
Sonal
Nube Technologies 






On Tue, Oct 15, 2013 at 6:04 AM, Sonya Ling  wrote:

> Hi:
>
>
> Currently, our hive_to_hdfs function has two parts.  The first part
> retrieves transactions records in Hive, put into a temporary file in local
> file system.The second part puts temporary file in local file system into
> HDFS.  The second part work on NameNode and is outside of Hadoop process
> and takes time.  I like to make hive_to_hdfs function go directly using the
>
> INSERT OVERWRITE [LOCAL] DIRECTORY directory1 SELECT ... FROM ...
>
> I did speed up the process using the above direct write.  However, I found
> out the step followed cannot process data generated due to unexpected
> format.   The following step expect TextInputFormat.   I checked Hive
> Language Manual.  It says
> "Data written to the filesystem is serialized as text with columns
> separated by ^A and rows separated by newlines. If any of the columns are
> not of primitive type, then those columns are serialized to JSON format."
>
> How can I make them compatible?  It does not look like I have way to
> change the defaulted format generated.  What can I set InputFormat to make
> it compatible?
>
> Thank.s
>
>


Re: Wikipedia Dump Analysis..

2013-10-08 Thread Sonal Goyal
Hi Ajeet,

Unfortunately, many of us are not familiar with the Wikipedia format as to
where the contributor information is coming from. If you could please
highlight that and let us know where you are stuck with Hive, we could
throw some ideas..

Sonal

Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>




On Tue, Oct 8, 2013 at 6:39 AM, Ajeet S Raina  wrote:

> Any suggestion??
> On 7 Oct 2013 11:24, "Ajeet S Raina"  wrote:
>
>> I was just trying to see if some interesting analysis is possible or
>> not.one thing which came to mind was tracking contributors and just thought
>> about that.
>>
>> Is it really possible?
>> On 7 Oct 2013 11:13, "Ajeet S Raina"  wrote:
>>
>>> I could see that revision history could be the target factor but no idea
>>> how to go for it. Any suggestion?
>>> On 7 Oct 2013 10:34, "Sonal Goyal"  wrote:
>>>
>>>> Sorry, where is the contributor information coming from?
>>>>
>>>> Best Regards,
>>>> Sonal
>>>> Nube Technologies <http://www.nubetech.co>
>>>>
>>>> <http://in.linkedin.com/in/sonalgoyal>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Oct 3, 2013 at 11:57 AM, Ajeet S Raina wrote:
>>>>
>>>>>  > Hello,
>>>>> >
>>>>> >
>>>>> >
>>>>> > I have Hadoop running on HDFS with Hive installed. I am able to
>>>>> import Wikipedia dump into HDFS through the below command:
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
>>>>> >
>>>>> >
>>>>> >
>>>>> > $ hadoop jar out.jar
>>>>> edu.umd.cloud9.collection.wikipedia.DumpWikipediaToPlainText -input
>>>>> /home/wikimedia/input/ enwiki-latest-pages-articles.xml  -output
>>>>> /home/wikimedia/output/3
>>>>> >
>>>>> >
>>>>> >
>>>>> > I am able to run Hive for the Wikipedia dump through this command:
>>>>> >
>>>>> >
>>>>> >
>>>>> > I have created one sample hive table based on small data I converted:
>>>>> >
>>>>> >
>>>>> >
>>>>> > CREATE EXTERNAL TABLE wiki_page(page_title string, page_body string)
>>>>> >
>>>>> > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
>>>>> >
>>>>> > STORED AS TEXTFILE
>>>>> >
>>>>> > LOCATION '/home/wikimedia/output/3';
>>>>> >
>>>>> >
>>>>> >
>>>>> > It created for me a record as shown below:
>>>>> >
>>>>> >
>>>>> >
>>>>> > Davy Jones (musician) Davy Jones (musician)   David Thomas
>>>>> "Davy" Jones (30 December 1945 – 29 February 2012) was an English
>>>>> recording artist and actor, best known as a member of The Monkees. Early
>>>>> lifeDavy Jones was born at 20 Leamington Street, Openshaw, Manchester,
>>>>> England, on 30 December 1945. At age 11, he began his acting career…
>>>>> >
>>>>> >
>>>>> >
>>>>> > My overall objective is to know how many contributors are from India
>>>>> and China.
>>>>> >
>>>>> > Any suggestion how to achieve that?
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>
>>>>


Re: How to load /t /n file to Hive

2013-10-06 Thread Sonal Goyal
Do you have the option to escape your tabs and newlines in your base file?

Best Regards,
Sonal
Nube Technologies 






On Sat, Sep 21, 2013 at 12:34 AM, Raj Hadoop  wrote:

> Hi,
>
> I have a file which is delimted by a tab. Also, there are some fields in
> the file which has a tab /t character and a new line /n character in some
> fields.
>
> Is there any way to load this file using Hive load command? Or do i have
> to use a Custom Map Reduce (custom) Input format with java ? Please advise.
>
> Thanks,
> Raj
>


Re: Wikipedia Dump Analysis..

2013-10-06 Thread Sonal Goyal
Sorry, where is the contributor information coming from?

Best Regards,
Sonal
Nube Technologies 






On Thu, Oct 3, 2013 at 11:57 AM, Ajeet S Raina  wrote:

> > Hello,
> >
> >
> >
> > I have Hadoop running on HDFS with Hive installed. I am able to import
> Wikipedia dump into HDFS through the below command:
> >
> >
> >
> >
> http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
> >
> >
> >
> > $ hadoop jar out.jar
> edu.umd.cloud9.collection.wikipedia.DumpWikipediaToPlainText -input
> /home/wikimedia/input/ enwiki-latest-pages-articles.xml  -output
> /home/wikimedia/output/3
> >
> >
> >
> > I am able to run Hive for the Wikipedia dump through this command:
> >
> >
> >
> > I have created one sample hive table based on small data I converted:
> >
> >
> >
> > CREATE EXTERNAL TABLE wiki_page(page_title string, page_body string)
> >
> > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> >
> > STORED AS TEXTFILE
> >
> > LOCATION '/home/wikimedia/output/3';
> >
> >
> >
> > It created for me a record as shown below:
> >
> >
> >
> > Davy Jones (musician) Davy Jones (musician)   David Thomas
> "Davy" Jones (30 December 1945 – 29 February 2012) was an English
> recording artist and actor, best known as a member of The Monkees. Early
> lifeDavy Jones was born at 20 Leamington Street, Openshaw, Manchester,
> England, on 30 December 1945. At age 11, he began his acting career…
> >
> >
> >
> > My overall objective is to know how many contributors are from India and
> China.
> >
> > Any suggestion how to achieve that?
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>


Re: Hive Connection Pooling

2013-10-06 Thread Sonal Goyal
Yes,  the Hive MetaStore does support JDBC connection pooling to the
underlying metastore database. You can configure this in hive-site.xml


  datanucleus.connectionPoolingType
  DBCP
  Uses a DBCP connection pool for JDBC metastore



In addition, you can also pool threads that service the requests received
over the MetaStore Thrift interface.



Best Regards,
Sonal
Nube Technologies 






On Sat, Oct 5, 2013 at 12:03 AM, S R  wrote:

> Is there a connection pooling mechanism for Hive Metastore Service? I am
> using embedded Postgres for my testing but in production we are planning to
> use MySQL.
>


Re: how to treat an existing partition data file as a table?

2013-10-06 Thread Sonal Goyal
You can always alter your table to add partitions later on. See the syntax
below
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddPartitions

Best Regards,
Sonal
Nube Technologies 






On Tue, Oct 1, 2013 at 2:13 AM, Yang  wrote:

> thanks guys, I found that the table is not partitioned, so I guess no way
> out...
>
>
> On Mon, Sep 30, 2013 at 9:31 AM, Olga L. Natkovich wrote:
>
>>  You need to specify a table partition from which you want to sample.
>>
>> ** **
>>
>> Olga
>>
>> ** **
>>
>> *From:* Yang [mailto:tedd...@gmail.com]
>> *Sent:* Sunday, September 29, 2013 1:39 PM
>> *To:* hive-u...@hadoop.apache.org
>>
>> *Subject:* how to treat an existing partition data file as a table?
>>
>>  ** **
>>
>> we have a huge table, including browsing data for the past 5 years, let's
>> say. 
>>
>> ** **
>>
>> now I want to take a few samples to play around with it. so I did
>>
>> select * from mytable limit 10;
>>
>> but it actually went full out and tried to scan the entire table. is
>> there a way to kind of create a "view" pointing to only one of the data
>> files used by the original table mytable ?
>>
>> this way the total files to be scanned is much smaller.
>>
>> ** **
>>
>> ** **
>>
>> thanks!
>> yang
>>
>
>


Re: how to treat an existing partition data file as a table?

2013-09-29 Thread Sonal Goyal
Is your table partitioned ?

Sent from my iPad

On Sep 30, 2013, at 2:09 AM, Yang  wrote:

> we have a huge table, including browsing data for the past 5 years, let's 
> say. 
> 
> now I want to take a few samples to play around with it. so I did
> select * from mytable limit 10;
> but it actually went full out and tried to scan the entire table. is there a 
> way to kind of create a "view" pointing to only one of the data files used by 
> the original table mytable ?
> this way the total files to be scanned is much smaller.
> 
> 
> thanks!
> yang


Re: Twitter Data analyse with HIVE

2012-06-05 Thread Sonal Goyal
Lfs means local file system. 

Hadoop fs -copyFromLocal will help to copy data from your local file system to 
the Hadoop distributed file system. Not sure what kind of cluster setup you 
have, are you running in local or pseudo distributed mode?

Here is a link to get you started on hive
https://cwiki.apache.org/confluence/display/Hive/GettingStarted

You can specifically look for 'load data local in path' for using the local 
file system.

And here is a link specifically regarding tweets.

http://www.cloudera.com/blog/2010/12/hadoop-world-2010-tweet-analysis/

Sent from my iPad

On 05-Jun-2012, at 9:27 PM, Babak Bastan  wrote:

> Thank you for your answer
> location of file in lfs
> That means the location of my *txt file on my computer ? and I have no 
> destination address in hdfs,where can I get this location?
> could you please write an example?
> 
> On Tue, Jun 5, 2012 at 4:29 PM, Bejoy Ks  wrote:
> Hi Babak
> 
> There isn't anything called hdfs files. Hdfs is just a file system that can 
> store any type of file. You just need to transfer your file from lfs to hdfs 
> and the following command helps you out for that
> 
> hadoop fs -copyFromLocal   hdfs>
> 
> Regards
> Bejoy KS
> 
> From: Babak Bastan 
> To: user@hive.apache.org 
> Sent: Tuesday, June 5, 2012 7:54 PM
> Subject: Re: Twitter Data analyse with HIVE
> 
> ok, no difference for me records in a line or not
>  2009-06-08 21:49:37 - http://twitter.com/evionblablabla- I think data mining 
> is awesome!
> 2009-06-08 21:49:37 - http://twitter.com/bliblibli -  I don’t think so. I 
> don’t like data mining
> 
> 
> How can I do that.I think that I should change my text file to hdfs 
> file,correct? how can I do this one?
> Sorry I'm very new in this field :(
> 
> On Tue, Jun 5, 2012 at 4:07 PM, Edward Capriolo  wrote:
> If you get output onto a single line it will be much easier for hive to 
> process.
> 
> On Tue, Jun 5, 2012 at 5:20 AM, Babak Bastan  wrote:
> > Hi experts
> >
> > I'm very new in Hive and Hadoop and I want to create a very simple demo to
> > analyse sample twitts like this:
> >
> > T 2009-06-08 21:49:37
> > U http://twitter.com/evion
> > W I think data mining is awesome!
> >
> > T 2009-06-08 21:49:37
> > U http://twitter.com/hyungjin
> > W I don’t think so. I don’t like data mining
> > 
> > Generally is it possible to do that?
> > but I don't know exactly from which point should I strat.Do you know any
> > simple and clear reference to do this job? or  would you please inform me
> > (not in detail) what should I do?
> >
> > Thank you very much for your helps
> > Babak
> 
> 
> 
> 


Re: Data migration in Hadoop

2011-09-13 Thread Sonal Goyal
Vikas,

I would suggest running the production clusters with replication factor of
3. Then you could decommission 2 nodes as Ayon suggests. Else one node at a
time.

Best Regards,
Sonal
Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>





On Tue, Sep 13, 2011 at 11:20 PM, Ayon Sinha  wrote:

> What you can do for each node:
> 1. decommission node (or 2 nodes if you want to do this faster). You can do
> this with the excludes file.
> 2. Wait for blocks to be moved off the decommed node(s)
> 3. Replace the disks and put them back in service.
> 4. Repeat until done.
>
> -Ayon
> See My Photos on Flickr <http://www.flickr.com/photos/ayonsinha/>
> Also check out my Blog for answers to commonly asked 
> questions.<http://dailyadvisor.blogspot.com>
>
> --
> *From:* Vikas Srivastava 
> *To:* user@hive.apache.org
> *Sent:* Tuesday, September 13, 2011 5:27 AM
> *Subject:* Re: Data migration in Hadoop
>
> hey sonal!!
>
> Actually right now we have 11 node cluster each having 8 disks of 3oogb and
> 8gb ram,
>
> now what we want to do is to replace those 300gb disks with 1 tb disks so
> that we can have more space per server.
>
> we have replication factor 2.
>
> my suggestion is ..
> 1:- Add a node of 8 tb in cluster and run balancer to balance the load.
> 2:- free any 1 node(repalcement node).
>
> question:- does the imbalance size in cluster is of any datanode create a
> problem...or have any bad impact
>
> regards
> Vikas Srivastava
>
>
> On Tue, Sep 13, 2011 at 5:37 PM, Sonal Goyal wrote:
>
> Hi Vikas,
>
> This was discussed in the groups recently:
>
> http://lucene.472066.n3.nabble.com/Fixing-a-bad-HD-tt2863634.html#none
>
> Are you looking at replacing all your datanodes, or only a few? how big is
> your cluster?
>
> Best Regards,
> Sonal
> Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
> Nube Technologies <http://www.nubetech.co>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
>
>
>
> On Tue, Sep 13, 2011 at 1:52 PM, Vikas Srivastava <
> vikas.srivast...@one97.net> wrote:
>
> HI ,
>
> can ny1 tell me how we can migrate hadoop or replace old hard disks with
> new big size hdd.
>
> actually i need to replace old hdd of 300 tbs to 1 tb so how can i do this
> efficiently!!!
>
> ploblem is to migrate data from 1 hdd to other
>
>
> --
> With Regards
> Vikas Srivastava
>
> DWH & Analytics Team
> Mob:+91 9560885900
> One97 | Let's get talking !
>
>
>
>
>
> --
> With Regards
> Vikas Srivastava
>
> DWH & Analytics Team
> Mob:+91 9560885900
> One97 | Let's get talking !
>
>
>
>


Re: Data migration in Hadoop

2011-09-13 Thread Sonal Goyal
Hi Vikas,

This was discussed in the groups recently:

http://lucene.472066.n3.nabble.com/Fixing-a-bad-HD-tt2863634.html#none

Are you looking at replacing all your datanodes, or only a few? how big is
your cluster?

Best Regards,
Sonal
Crux: Reporting for HBase 
Nube Technologies 







On Tue, Sep 13, 2011 at 1:52 PM, Vikas Srivastava <
vikas.srivast...@one97.net> wrote:

> HI ,
>
> can ny1 tell me how we can migrate hadoop or replace old hard disks with
> new big size hdd.
>
> actually i need to replace old hdd of 300 tbs to 1 tb so how can i do this
> efficiently!!!
>
> ploblem is to migrate data from 1 hdd to other
>
>
> --
> With Regards
> Vikas Srivastava
>
> DWH & Analytics Team
> Mob:+91 9560885900
> One97 | Let's get talking !
>
>


Re: Opposite of explode?

2011-02-10 Thread Sonal Goyal
Is collect_set what you are looking for? I havent used it myself, but it
seems to remove the duplicates..

http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF#Built-in_Aggregate_Functions_.28UDAF.29

Thanks and Regards,
Sonal
Connect Hadoop with databases,
Salesforce, FTP servers and others 
Nube Technologies 







On Fri, Feb 11, 2011 at 9:43 AM, Tim Robertson wrote:

> Hi all,
>
> Sorry if I am missing something obvious but is there an inverse of an
> explode?
>
> E.g. given t1
>
> ID Name
> 1  Tim
> 2  Tim
> 3  Tom
> 4  Frank
> 5  Tim
>
> Can you create t2:
>
> Name ID
> Tim1,2,5
> Tom   3
> Frank 4
>
> In Oracle it would be a
>  select name,collect(id) from t1 group by name
>
> I suspect in Hive it is related to an Array but can't find the syntax
>
> Thanks for any pointers,
> Tim
>


Re: hive newbie - importing data into hive

2010-12-14 Thread Sonal Goyal
Sean,

You can refer to
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Create.2BAC8-Drop_Table

You can define the ROW FORMAT DELIMITED as part of the table definition and
then load your data into the table.

Thanks and Regards,
Sonal
Connect Hadoop with databases,
Salesforce, FTP servers and others 
Nube Technologies 







On Wed, Dec 15, 2010 at 10:29 AM, Sean Curtis  wrote:

> thanks Bryan
>
> should have been more specific. checked that guide and it didnt seem
> obvious with LOAD INFILE how to take a file that was already pipe-delimited
> and straight import that.
>
> is there a section of the doc i may have missed or some tip that can help
> there?
>
> sean
>
> On Dec 14, 2010, at 11:36 PM, Bryan Talbot wrote:
>
> I'll help by recommending that you get started by looking at the "Getting
> Started Guide".
>
> http://wiki.apache.org/hadoop/Hive/GettingStarted
>
>
> -Bryan
>
>
> On Tue, Dec 14, 2010 at 8:23 PM, Sean Curtis wrote:
>
>> just wondering if i have a pipe delimited file, how i can just import this
>> data into hive:
>>
>> basically i am using the microlens database, which is pipe separated. for
>> example:
>>
>> user id | age | gender | occupation | zip code
>>
>> translates to
>>
>> 123 | 24 | M | worker | 12345
>>
>>
>> i'd like to just import this straight into Hive. my initial thoughts:
>> 1. use unix substitute command and change all "|" to "Ctrl-A".
>> 2. import into mysql, then use sqoop
>>
>> seems it should be easier than this. can someone help?
>>
>> thanks for the help.
>>
>> sean
>
>
>
>


Re: Only a single expression in the SELECT clause is supported with UDTF's

2010-11-08 Thread Sonal Goyal
Hi Tim,

I guess you are running into limitations while using UDTFs. Check
http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF#UDTF. I think you
should be able to use lateral view in your query.

Thanks and Regards,
Sonal

Sonal Goyal | Founder and CEO | Nube Technologies LLP
http://www.nubetech.co
http://code.google.com/p/hiho/
<http://in.linkedin.com/in/sonalgoyal>





On Mon, Nov 8, 2010 at 2:11 PM, Tim Robertson wrote:

> Hi all,
>
> I am trying my first UDTF, but can't seem to get it to run.  Can
> anyone spot anything wrong with this please:
>
> hive> select taxonDensityUDTF(kingdom_concept_id, phylum_concept_id)
> as p,k from temp_kingdom_phylum;
> FAILED: Error in semantic analysis: Only a single expression in the
> SELECT clause is supported with UDTF's
> hive>
>
> Below is my code.  Thanks for any pointers,
>
> Tim
>
>
>
> @description(
>  name = "taxonDensityUDTF",
>  value = "_FUNC_(kingdom_concept_id, phylum_concept_id)"
>)
> public class TaxonDensityUDTF extends GenericUDTF {
>Integer kingdom_concept_id = Integer.valueOf(0);
>Integer phylum_concept_id = Integer.valueOf(0);
>
>/**
> * @see org.apache.hadoop.hive.ql.udf.generic.GenericUDTF#close()
> */
>@Override
>public void close() throws HiveException {
>Object[] forwardObj = new Object[2];
>forwardObj[0] = kingdom_concept_id;
>forwardObj[1] = phylum_concept_id;
>forward(forwardObj);
>// TEST STUFF FOR NOW
>forwardObj = new Object[2];
>forwardObj[0] = kingdom_concept_id+1;
>forwardObj[1] = phylum_concept_id+1;
>forward(forwardObj);
>}
>
>/**
> * @see
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTF#initialize(org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector[])
> */
>@Override
>public StructObjectInspector initialize(ObjectInspector[] arg0)
> throws UDFArgumentException {
>ArrayList fieldNames = new ArrayList();
>ArrayList fieldOIs = new
> ArrayList();
>fieldNames.add("kingdom_concept_id");
>fieldNames.add("phylum_concept_id");
>
>  fieldOIs.add(PrimitiveObjectInspectorFactory.javaIntObjectInspector);
>
>  fieldOIs.add(PrimitiveObjectInspectorFactory.javaIntObjectInspector);
>return
> ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,fieldOIs);
>}
>
>/**
> * @see
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTF#process(java.lang.Object[])
> */
>@Override
>public void process(Object[] args) throws HiveException {
>kingdom_concept_id = (Integer) args[0];
>phylum_concept_id = (Integer) args[1];
>}
> }
>


Re: Unions causing many scans of input - workaround?

2010-11-07 Thread Sonal Goyal
Hey Tim,

You have an interesting problem. Have you tried creating a UDTF for your
case, so that you can possibly emit more than one record for each row of
your input?

http://wiki.apache.org/hadoop/Hive/DeveloperGuide/UDTF

Thanks and Regards,
Sonal

Sonal Goyal | Founder and CEO | Nube Technologies LLP
http://www.nubetech.co | http://in.linkedin.com/in/sonalgoyal





On Mon, Nov 8, 2010 at 2:31 AM, Tim Robertson wrote:

> Hi all,
>
> I am porting custom MR code to Hive and have written working UDFs
> where I need them.  Is there a work around to having to do this in
> Hive:
>
> select * from
> (
>select name_id, toTileX(longitude,0) as x, toTileY(latitude,0) as
> y, 0 as zoom, funct2(lontgitude, 0) as f2_x, funct2(latitude,0) as
> f2_y, count (1) as count
>from table
>group by name_id, x, y, f2_x, f2_y
>
>UNION ALL
>
>select name_id, toTileX(longitude,1) as x, toTileY(latitude,1) as
> y, 1 as zoom, funct2(lontgitude, 1) as f2_x, funct2(latitude,1) as
> f2_y, count (1) as count
>from table
>group by name_id, x, y, f2_x, f2_y
>
>   --- etc etc increasing in zoom
> )
>
> The issue being that this does many passes over the table, whereas
> previously in my Map() I would just emit many times from the same
> input record and then let it all group in the shuffle and sort.
> I actually emit 184 times for an input record (23 zoom levels of
> google maps, and 8 ways to derive the name_id) for a single record
> which means 184 union statements - Is it possible in hive to force it
> to emit many times from the source record in the stage-1 map?
>
> (ahem) Does anyone know if Pig can do this if not in Hive?
>
> I hope I have explained this well enough to make sense.
>
> Thanks in advance,
> Tim
>


Hive and Hadoop 0.21.0

2010-10-22 Thread Sonal Goyal
Hi,

I need to get Hive working on a 0.21.0 Hadoop cluster. Can someone please
let me know how it can be done. I tried HIVE-1612 but it did not work for
me. Am I missing something?

Thanks and Regards,
Sonal

Sonal Goyal | Founder and CEO | Nube Technologies LLP
http://www.nubetech.co | http://in.linkedin.com/in/sonalgoyal