Re: Hive on Spark engine

2016-03-26 Thread Ted Yu
According to:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_HDP_RelNotes/bk_HDP_RelNotes-20151221.pdf

Spark 1.5.2 comes out of box.

Suggest moving questions on HDP to Hortonworks forum.

Cheers

On Sat, Mar 26, 2016 at 3:32 PM, Mich Talebzadeh 
wrote:

> Thanks Jorn.
>
> Just to be clear they get Hive working with Spark 1.6 out of the box
> (binary download)? The usual work-around is to build your own package and
> get the Hadoop-assembly jar file copied over to $HIVE_HOME/lib.
>
>
> Cheers
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 26 March 2016 at 22:08, Jörn Franke  wrote:
>
>> If you check the newest Hortonworks distribution then you see that it
>> generally works. Maybe you can borrow some of their packages. Alternatively
>> it should be also available in other distributions.
>>
>> On 26 Mar 2016, at 22:47, Mich Talebzadeh 
>> wrote:
>>
>> Hi,
>>
>> I am running Hive 2 and now Spark 1.6.1 but I still do not see any sign
>> that Hive can utilise a Spark engine higher than 1.3.1
>>
>> My understanding was that there were miss-match on Hadoop assembly Jar
>> files that cause Hive not being able to run on Spark using the binary
>> downloads. I just tried Hive 2 on Spark 1.6 as the execution engine and it
>> crashed.
>>
>> I do not know the development state of this cross-breed but will be very
>> desirable if we could manage to sort out
>> this spark-assembly-1.x.1-hadoop2.4.0.jar for once.
>>
>> Thanks
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>>
>


Re: binary file deserialization

2016-03-09 Thread Ted Yu
bq. there is a varying number of items for that record

If the combination of items is very large, using case class would be
tedious.

On Wed, Mar 9, 2016 at 9:57 AM, Saurabh Bajaj 
wrote:

> You can load that binary up as a String RDD, then map over that RDD and
> convert each row to your case class representing the data. In the map stage
> you could also map the input string into an RDD of JSON values and use the
> following function to convert it into a DF
>
> http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets
>
> val anotherPeople = sqlContext.read.json(anotherPeopleRDD)
>
>
> On Wed, Mar 9, 2016 at 9:15 AM, Ruslan Dautkhanov 
> wrote:
>
>> We have a huge binary file in a custom serialization format (e.g. header
>> tells the length of the record, then there is a varying number of items for
>> that record). This is produced by an old c++ application.
>> What would be best approach to deserialize it into a Hive table or a
>> Spark RDD?
>> Format is known and well documented.
>>
>>
>> --
>> Ruslan Dautkhanov
>>
>
>


Re: error while defining custom schema in Spark 1.5.0

2015-12-25 Thread Ted Yu
The error was due to blank field being defined twice.

On Tue, Dec 22, 2015 at 12:03 AM, Divya Gehlot 
wrote:

> Hi,
> I am new bee to Apache Spark ,using  CDH 5.5 Quick start VM.having spark
> 1.5.0.
> I working on custom schema and getting error
>
> import org.apache.spark.sql.hive.HiveContext
>>>
>>> scala> import org.apache.spark.sql.hive.orc._
>>> import org.apache.spark.sql.hive.orc._
>>>
>>> scala> import org.apache.spark.sql.types.{StructType, StructField,
>>> StringType, IntegerType};
>>> import org.apache.spark.sql.types.{StructType, StructField, StringType,
>>> IntegerType}
>>>
>>> scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>>> 15/12/21 23:41:53 INFO hive.HiveContext: Initializing execution hive,
>>> version 1.1.0
>>> 15/12/21 23:41:53 INFO client.ClientWrapper: Inspected Hadoop version:
>>> 2.6.0-cdh5.5.0
>>> 15/12/21 23:41:53 INFO client.ClientWrapper: Loaded
>>> org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0-cdh5.5.0
>>> hiveContext: org.apache.spark.sql.hive.HiveContext =
>>> org.apache.spark.sql.hive.HiveContext@214bd538
>>>
>>> scala> val customSchema = StructType(Seq(StructField("year",
>>> IntegerType, true),StructField("make", StringType,
>>> true),StructField("model", StringType, true),StructField("comment",
>>> StringType, true),StructField("blank", StringType, true)))
>>> customSchema: org.apache.spark.sql.types.StructType =
>>> StructType(StructField(year,IntegerType,true),
>>> StructField(make,StringType,true), StructField(model,StringType,true),
>>> StructField(comment,StringType,true), StructField(blank,StringType,true))
>>>
>>> scala> val customSchema = (new StructType).add("year", IntegerType,
>>> true).add("make", StringType, true).add("model", StringType,
>>> true).add("comment", StringType, true).add("blank", StringType, true)
>>> customSchema: org.apache.spark.sql.types.StructType =
>>> StructType(StructField(year,IntegerType,true),
>>> StructField(make,StringType,true), StructField(model,StringType,true),
>>> StructField(comment,StringType,true), StructField(blank,StringType,true))
>>>
>>> scala> val customSchema = StructType( StructField("year", IntegerType,
>>> true) :: StructField("make", StringType, true) :: StructField("model",
>>> StringType, true) :: StructField("comment", StringType, true) ::
>>> StructField("blank", StringType, true)::StructField("blank", StringType,
>>> true))
>>> :24: error: value :: is not a member of
>>> org.apache.spark.sql.types.StructField
>>>val customSchema = StructType( StructField("year", IntegerType,
>>> true) :: StructField("make", StringType, true) :: StructField("model",
>>> StringType, true) :: StructField("comment", StringType, true) ::
>>> StructField("blank", StringType, true)::StructField("blank", StringType,
>>> true))
>>>
>>
> Tried like like below also
>
> scala> val customSchema = StructType( StructField("year", IntegerType,
> true), StructField("make", StringType, true) ,StructField("model",
> StringType, true) , StructField("comment", StringType, true) ,
> StructField("blank", StringType, true),StructField("blank", StringType,
> true))
> :24: error: overloaded method value apply with alternatives:
>   (fields:
> Array[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType
> 
>   (fields:
> java.util.List[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType
> 
>   (fields:
> Seq[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType
>  cannot be applied to (org.apache.spark.sql.types.StructField,
> org.apache.spark.sql.types.StructField,
> org.apache.spark.sql.types.StructField,
> org.apache.spark.sql.types.StructField,
> org.apache.spark.sql.types.StructField,
> org.apache.spark.sql.types.StructField)
>val customSchema = StructType( StructField("year", IntegerType,
> true), StructField("make", StringType, true) ,StructField("model",
> StringType, true) , StructField("comment", StringType, true) ,
> StructField("blank", StringType, true),StructField("blank", StringType,
> true))
>   ^
>Would really appreciate if somebody share the example which works with
> Spark 1.4 or Spark 1.5.0
>
> Thanks,
> Divya
>
> ^
>


Re: Intermittent BindException during long MR jobs

2015-02-28 Thread Ted Yu
Krishna:
Please take a look at:
http://wiki.apache.org/hadoop/BindException

Cheers

On Thu, Feb 26, 2015 at 10:30 PM,  wrote:

> Hello Krishna,
>
>
>
> Exception seems to be IP specific. It might be occurred due to
> unavailability of IP address in the system to assign. Double check the IP
> address availability and run the job.
>
>
>
> *Thanks,*
>
> *S.RagavendraGanesh*
>
> ViSolve Hadoop Support Team
> ViSolve Inc. | San Jose, California
> Website: www.visolve.com
>
> email: servi...@visolve.com | Phone: 408-850-2243
>
>
>
>
>
> *From:* Krishna Rao [mailto:krishnanj...@gmail.com]
> *Sent:* Thursday, February 26, 2015 9:48 PM
> *To:* user@hive.apache.org; u...@hadoop.apache.org
> *Subject:* Intermittent BindException during long MR jobs
>
>
>
> Hi,
>
>
>
> we occasionally run into a BindException causing long running jobs to
> occasionally fail.
>
>
>
> The stacktrace is below.
>
>
>
> Any ideas what this could be caused by?
>
>
>
> Cheers,
>
>
>
> Krishna
>
>
>
>
>
> Stacktrace:
>
> 379969 [Thread-980] ERROR org.apache.hadoop.hive.ql.exec.Task  - Job
> Submission failed with exception 'java.net.BindException(Problem binding to
> [back10/10.4.2.10:0] java.net.BindException: Cann
>
> ot assign requested address; For more details see:
> http://wiki.apache.org/hadoop/BindException)'
>
> java.net.BindException: Problem binding to [back10/10.4.2.10:0]
> java.net.BindException: Cannot assign requested address; For more details
> see:  http://wiki.apache.org/hadoop/BindException
>
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:718)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1242)
>
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>
> at com.sun.proxy.$Proxy10.create(Unknown Source)
>
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:193)
>
> at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
> at java.lang.reflect.Method.invoke(Method.java:597)
>
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>
> at com.sun.proxy.$Proxy11.create(Unknown Source)
>
> at
> org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1376)
>
> at
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1395)
>
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1255)
>
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1212)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:276)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:265)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:82)
>
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888)
>
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:869)
>
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:768)
>
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:757)
>
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:558)
>
> at
> org.apache.hadoop.mapreduce.split.JobSplitWriter.createFile(JobSplitWriter.java:96)
>
> at
> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:85)
>
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:517)
>
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:487)
>
> at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:369)
>
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1286)
>
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1283)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:396)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
>
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1283)
>
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:396)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
>
> at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>
> 

Re: unsubscribe

2013-07-18 Thread Ted Yu
You have to send a mail to user-unsubscr...@hive.apache.org

On Thu, Jul 18, 2013 at 1:30 PM, Beau Rothrock wrote:

>


Re: getRegion method is missing from RegionCoprocessorEnvironment class in version 0.94

2013-07-11 Thread Ted Yu
Which release of 0.94 do you use ?

For tip of 0.94, I see:

public interface RegionCoprocessorEnvironment extends
CoprocessorEnvironment {
  /** @return the region associated with this coprocessor */
  public HRegion getRegion();

On Thu, Jul 11, 2013 at 6:40 PM, ch huang  wrote:

> how can i do ,to change the example code based on 0.92 api ,so it can
> running in 0.94?
>


Re: Variable resolution Fails

2013-05-01 Thread Ted Yu
Naidu:
Please don't hijack existing thread. Your questions are not directly related to 
Hive. 

Cheers

On May 1, 2013, at 12:53 AM, Naidu MS  wrote:

> Hi i have two questions regarding hdfs and jps utility
> 
> I am new to Hadoop and started leraning hadoop from the past week
> 
> 1.when ever i start start-all.sh and jps in console it showing the processes 
> started
> 
> naidu@naidu:~/work/hadoop-1.0.4/bin$ jps
> 22283 NameNode
> 23516 TaskTracker
> 26711 Jps
> 22541 DataNode
> 23255 JobTracker
> 22813 SecondaryNameNode
> Could not synchronize with target
> 
> But along with the list of process stared it always showing " Could not 
> synchronize with target" in the jps output. What is meant by "Could not 
> synchronize with target"?  Can some one explain why this is happening?
> 
> 
> 2.Is it possible to format namenode multiple  times? When i enter the  
> namenode -format command, it not formatting the name node and showing the 
> following ouput.
> 
> naidu@naidu:~/work/hadoop-1.0.4/bin$ hadoop namenode -format
> Warning: $HADOOP_HOME is deprecated.
> 
> 13/05/01 12:08:04 INFO namenode.NameNode: STARTUP_MSG: 
> /
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG:   host = naidu/127.0.0.1
> STARTUP_MSG:   args = [-format]
> STARTUP_MSG:   version = 1.0.4
> STARTUP_MSG:   build = 
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 
> 1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012
> /
> Re-format filesystem in /home/naidu/dfs/namenode ? (Y or N) y
> Format aborted in /home/naidu/dfs/namenode
> 13/05/01 12:08:05 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down NameNode at naidu/127.0.0.1
> 
> /
> 
> Can someone help me in understanding this? Why is it not possible to format 
> name node multiple times?
> 
> 
> On Wed, May 1, 2013 at 8:14 AM, Sanjay Subramanian 
>  wrote:
>> +1  agreed
>> 
>> Also as a general script programming practice I check if the variables I am 
>> going to use are NON empty before using them…nothing related to Hive scripts
>> 
>> If [ ${freq} == "" ]
>> then
>>echo "variable freq is empty…exiting"
>>exit 1
>> Fi
>>  
>> 
>> 
>> From: Anthony Urso 
>> Reply-To: "user@hive.apache.org" 
>> Date: Tuesday, April 30, 2013 7:20 PM
>> To: "user@hive.apache.org" , sumit ghosh 
>> 
>> Subject: Re: Variable resolution Fails
>> 
>> Your shell is expanding the variable ${env:freq}, which doesn't exist in the 
>> shell's environment, so hive is getting the empty string in that place.  If 
>> you are always intending to run your query like this, just use ${freq} which 
>> will be expanded as expected by bash and then passed to hive.
>> 
>> Cheers,
>> Anthony
>> 
>> 
>> On Tue, Apr 30, 2013 at 4:40 PM, sumit ghosh  wrote:
>>> Hi,
>>>  
>>> The following variable freq fails to resolve:
>>>  
>>> bash-4.1$ export freq=MNTH
>>> bash-4.1$ echo $freq
>>> MNTH
>>> bash-4.1$ hive -e "select ${env:freq} as dr  from dual"
>>> Logging initialized using configuration in 
>>> file:/etc/hive/conf.dist/hive-log4j.properties
>>> Hive history 
>>> file=/hadoop1/hive_querylog/sumighos/hive_job_log_sumighos_201304302321_1867815625.txt
>>> FAILED: ParseException line 1:8 cannot recognize input near 'as' 'dr' 
>>> 'from' in select clause
>>> bash-4.1$
>>>  
>>> Here dual is a table with 1 row.
>>> What am I am doing wrong? When I try to resolve freq - it is empty!!
>>>  
>>>   $ hive -S -e "select '${env:freq}' as dr  from dual"
>>>  
>>>   $
>>>  
>>> Thanks,
>>> Sumit
>> 
>> 
>> CONFIDENTIALITY NOTICE
>> ==
>> This email message and any attachments are for the exclusive use of the 
>> intended recipient(s) and may contain confidential and privileged 
>> information. Any unauthorized review, use, disclosure or distribution is 
>> prohibited. If you are not the intended recipient, please contact the sender 
>> by reply email and destroy all copies of the original message along with any 
>> attachments, from your computer system. If you are the intended recipient, 
>> please be advised that the content of this message is subject to access, 
>> review and disclosure by the sender's Email System Administrator.
> 


Re: no data in external table

2012-10-04 Thread Ted Yu
Can you tell us how you created mapping for the existing table ?

In task log, do you see any connection attempt to HBase ?

Cheers

On Thu, Oct 4, 2012 at 11:30 AM,  wrote:

> Hello,
>
> I use hive-0.9.0 with hadoop-0.20.2 and hbase -0.92.1. I have created
> external table, mapping it to an existing table in hbase. When I do "select
> * from myextrenaltable" it returns no results, although scan in hbase shows
> data, and I do not see any errors in jobtracker log.
>
> Any ideas how to debug this issue.
>
> Thanks.
> Alex.
>


Re: HBase aggregate query

2012-09-10 Thread Ted Yu
Hi,
Are you able to get the number you want through hive log ?

Thanks

On Mon, Sep 10, 2012 at 7:03 AM, iwannaplay games <
funnlearnfork...@gmail.com> wrote:

> Hi ,
>
> I want to run query like
>
> select month(eventdate),scene,count(1),sum(timespent) from eventlog
> group by month(eventdate),scene
>
>
> in hbase.Through hive its taking a lot of time for 40 million
> records.Do we have any syntax in hbase to find its result?In sql
> server it takes around 9 minutes,How long it might take in hbase??
>
> Regards
> Prabhjot
>


Re: HBaseSerDe

2012-07-25 Thread Ted Yu
The ctor is used in TestHBaseSerDe.java

So maybe change it to package private ?

On Wed, Jul 25, 2012 at 12:43 PM, kulkarni.swar...@gmail.com <
kulkarni.swar...@gmail.com> wrote:

> While going through some code for HBase/Hive Integration, I came across
> this constructor:
>
> public HBaseSerDe() throws SerDeException {
>
> }
>
> Basically, the constructor is doing nothing but throwing an exception.
> Problem is fixing this now will be a non-passive change.
>
> I couldn't really find an obvious reason for this to be there. Are there
> any objections if I file a JIRA to remove this constructor?
> --
> Swarnim
>


Re: Hive and CDH4 GA

2012-07-16 Thread Ted Yu
I see the following in build.properties :

hadoop.version=${hadoop-0.20.version}
hadoop.security.version=${hadoop-0.20S.version}

Have you tried to override the above property values when building ?

If it still fails, please comment on
HIVE-3029<https://issues.apache.org/jira/browse/HIVE-3029>
.

Thanks

On Mon, Jul 16, 2012 at 2:42 PM, kulkarni.swar...@gmail.com <
kulkarni.swar...@gmail.com> wrote:

> I found this issue [1] and applied the patch but still the issue persists.
>
> Any different way that I should be creating my assembly (currently just
> doing "ant clean tar") so that it works with hadoop 2.0.0 on its classpath?
>
> Any help is appreciated.
>
> Thanks,
>
> [1] https://issues.apache.org/jira/browse/HIVE-3029
>
>
> On Fri, Jul 13, 2012 at 10:27 PM, Ted Yu  wrote:
>
>> See
>> https://issues.apache.org/jira/browse/HADOOP-8350?focusedCommentId=13414276&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13414276
>>
>> Cheers
>>
>>
>> On Fri, Jul 13, 2012 at 12:38 PM, kulkarni.swar...@gmail.com <
>> kulkarni.swar...@gmail.com> wrote:
>>
>>> Has anyone being using hive 0.9.0 release with the CDH4 GA release? I
>>> keep hitting this exception on its interaction with HBase.
>>>
>>> java.lang.NoSuchMethodError:
>>> org.apache.hadoop.net.NetUtils.getInputStream(Ljava/net/Socket;)Ljava/io/InputStream;
>>>  at
>>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:363)
>>> at
>>> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1026)
>>>  at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:878)
>>> at
>>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
>>>  at $Proxy9.getProtocolVersion(Unknown Source)
>>> at
>>> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183)
>>>  at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303)
>>> at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280)
>>>  at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332)
>>> at
>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:642)
>>>
>>> --
>>> Swarnim
>>>
>>
>>
>
>
> --
> Swarnim
>


Re: Hive and CDH4 GA

2012-07-13 Thread Ted Yu
See
https://issues.apache.org/jira/browse/HADOOP-8350?focusedCommentId=13414276&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13414276

Cheers

On Fri, Jul 13, 2012 at 12:38 PM, kulkarni.swar...@gmail.com <
kulkarni.swar...@gmail.com> wrote:

> Has anyone being using hive 0.9.0 release with the CDH4 GA release? I keep
> hitting this exception on its interaction with HBase.
>
> java.lang.NoSuchMethodError:
> org.apache.hadoop.net.NetUtils.getInputStream(Ljava/net/Socket;)Ljava/io/InputStream;
>  at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:363)
> at
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1026)
>  at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:878)
> at
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
>  at $Proxy9.getProtocolVersion(Unknown Source)
> at
> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183)
>  at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303)
> at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280)
>  at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:642)
>
> --
> Swarnim
>


Re: What virtual column does hive.exec.rowoffset add?

2012-03-25 Thread Ted Yu
>From VirtualColumn.java:
  public static VirtualColumn ROWOFFSET = new
VirtualColumn("ROW__OFFSET__INSIDE__BLOCK",
(PrimitiveTypeInfo)TypeInfoFactory.longTypeInfo);

Cheers

On Sun, Mar 25, 2012 at 2:30 PM, Edward Capriolo wrote:

> 
>  hive.exec.rowoffset
>  false
>  Whether to provide the row offset virtual
> column
> 
>
> I know the others are INPUT__FILE__NAME and BLOCK_OFFSET Does
> anyone know what this third columns is?
>
> Edward
>


Re: Running Hive Queries in Parallel?

2011-07-08 Thread Ted Yu
Set it in conf/hive-site.xml

On Thu, Jul 7, 2011 at 10:59 PM, hadoop n00b  wrote:

> Found *hive*.*exec*.*parallel!!!*
> How can I set to to be 'true' by default?
>
> Thanks!
>
>
> On Fri, Jul 8, 2011 at 11:14 AM, hadoop n00b  wrote:
>
>> Hello,
>>
>> When I execute multiple queries in Hive, the mapred tasks are queued up
>> and executed one by one. Is there are way I could set Hive or Hadoop to
>> execute mapred tasks in parallel?
>>
>> I am running on Hive 0.4.1 and Hadoop 0.20
>>
>> Tx!
>>
>
>


Re: URGENT: I need the Hive Server setup Wiki

2011-06-27 Thread Ted Yu
There're 3 links on that page.

On Mon, Jun 27, 2011 at 2:39 PM, Ted Yu  wrote:

> wiki has moved.
> See
> https://cwiki.apache.org/confluence/display/Hive/AdminManual+SettingUpHiveServer
>
>
> On Mon, Jun 27, 2011 at 2:31 PM, Ayon Sinha  wrote:
>
>>
>> https://cwiki.apache.org/confluence/display/Hive/AdminManual+SettingUpHiveServeris
>>  empty
>> and the old link is gone.
>>
>> -Ayon
>> See My Photos on Flickr <http://www.flickr.com/photos/ayonsinha/>
>> Also check out my Blog for answers to commonly asked 
>> questions.<http://dailyadvisor.blogspot.com>
>>
>
>


Re: URGENT: I need the Hive Server setup Wiki

2011-06-27 Thread Ted Yu
wiki has moved.
See
https://cwiki.apache.org/confluence/display/Hive/AdminManual+SettingUpHiveServer

On Mon, Jun 27, 2011 at 2:31 PM, Ayon Sinha  wrote:

>
> https://cwiki.apache.org/confluence/display/Hive/AdminManual+SettingUpHiveServeris
>  empty
> and the old link is gone.
>
> -Ayon
> See My Photos on Flickr 
> Also check out my Blog for answers to commonly asked 
> questions.
>


Re: How to change the hive.metastore.warehouse.dir ?

2011-05-17 Thread Ted Yu
Can you try as user hadoop ?

Cheers



On May 17, 2011, at 9:53 PM, jinhang du  wrote:

> hi,
> The default value is "/user/hive/warehouse" in hive.site.xml. After I changed 
> the directory to a path on HDFS, I got the exception.
> 
> FAILED: Error in metadata: MetaException(message:Got exception: 
> org.apache.hadoop.security.
> AccessControlException org.apache.hadoop.security.AccessControlException: 
> Permission denied: 
> user=root, access=WRITE, inode="output 
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask
> 
> Is this failure related to the hadoop-site.xml or something?
> Thanks for your help.
> 
> -- 
> dujinhang


Re: Hive/hbase integration - Rebuild the Storage Handler

2011-03-22 Thread Ted Yu
Look at this line in ivy/libraries.properties:
hbase.version=0.89.0-SNAPSHOT

The jars would be downloaded to:
./build/ivy/lib/default/hbase-0.89.0-SNAPSHOT-tests.jar
./build/ivy/lib/default/hbase-0.89.0-SNAPSHOT.jar

After changing hbase.version, I saw:

[ivy:resolve] :: problems summary ::
[ivy:resolve]  WARNINGS
[ivy:resolve]   module not found:
org.apache.hbase#hbase;0.90.0-SNAPSHOT
[ivy:resolve]    hadoop-source: tried
[ivy:resolve] -- artifact
org.apache.hbase#hbase;0.90.0-SNAPSHOT!hbase.jar:
[ivy:resolve]
http://mirror.facebook.net/facebook/hive-deps/hadoop/core/hbase-0.90.0-SNAPSHOT/hbase-0.90.0-SNAPSHOT.jar
[ivy:resolve] -- artifact
org.apache.hbase#hbase;0.90.0-SNAPSHOT!hbase.jar(test-jar):
[ivy:resolve]
http://mirror.facebook.net/facebook/hive-deps/hadoop/core/hbase-0.90.0-SNAPSHOT/hbase-0.90.0-SNAPSHOT.jar
[ivy:resolve]    apache-snapshot: tried
[ivy:resolve]
https://repository.apache.org/content/repositories/snapshots/org/apache/hbase/hbase/0.90.0-SNAPSHOT/hbase-0.90.0-SNAPSHOT.pom
[ivy:resolve] -- artifact
org.apache.hbase#hbase;0.90.0-SNAPSHOT!hbase.jar:
[ivy:resolve]
https://repository.apache.org/content/repositories/snapshots/org/apache/hbase/hbase/0.90.0-SNAPSHOT/hbase-0.90.0-SNAPSHOT.jar
[ivy:resolve] -- artifact
org.apache.hbase#hbase;0.90.0-SNAPSHOT!hbase.jar(test-jar):
[ivy:resolve]
https://repository.apache.org/content/repositories/snapshots/org/apache/hbase/hbase/0.90.0-SNAPSHOT/hbase-0.90.0-SNAPSHOT-tests.jar
[ivy:resolve]    maven2: tried
[ivy:resolve]
http://repo1.maven.org/maven2/org/apache/hbase/hbase/0.90.0-SNAPSHOT/hbase-0.90.0-SNAPSHOT.pom
[ivy:resolve] -- artifact
org.apache.hbase#hbase;0.90.0-SNAPSHOT!hbase.jar:
[ivy:resolve]
http://repo1.maven.org/maven2/org/apache/hbase/hbase/0.90.0-SNAPSHOT/hbase-0.90.0-SNAPSHOT.jar
[ivy:resolve] -- artifact
org.apache.hbase#hbase;0.90.0-SNAPSHOT!hbase.jar(test-jar):
[ivy:resolve]
http://repo1.maven.org/maven2/org/apache/hbase/hbase/0.90.0-SNAPSHOT/hbase-0.90.0-SNAPSHOT.jar
[ivy:resolve]    datanucleus-repo: tried
[ivy:resolve] -- artifact
org.apache.hbase#hbase;0.90.0-SNAPSHOT!hbase.jar:
[ivy:resolve]
http://www.datanucleus.org/downloads/maven2/org/apache/hbase/hbase/0.90.0-SNAPSHOT/hbase-0.90.0-SNAPSHOT.jar
[ivy:resolve] -- artifact
org.apache.hbase#hbase;0.90.0-SNAPSHOT!hbase.jar(test-jar):
[ivy:resolve]
http://www.datanucleus.org/downloads/maven2/org/apache/hbase/hbase/0.90.0-SNAPSHOT/hbase-0.90.0-SNAPSHOT.jar
[ivy:resolve]   ::
[ivy:resolve]   ::  UNRESOLVED DEPENDENCIES ::
[ivy:resolve]   ::
[ivy:resolve]   :: org.apache.hbase#hbase;0.90.0-SNAPSHOT: not found
[ivy:resolve]   ::

Copying d...@hbase.apache.org for further comment.

On Tue, Mar 22, 2011 at 8:40 AM, Jean-Charles Thomas <
jctho...@autoscout24.com> wrote:

>  Hi,
>
>
>
> I am using hbase 0.90 and Hive 0.7 and would like to try the hive/hbase
> integration. From the Wiki Doc I could see that I have to rebuild the the
> handler:
>
>
>
> “If you are not using hbase-0.89.0, you will need to rebuild the handler
> with the HBase jar matching your version, and change the --auxpath above
> accordingly”
>
>
>
> Can someone explain in more details how this can be done? I unfortunately
> only have a basic java knowledge.
>
>
>
> Thanks in advance,
>
>
>
> JC
>


Re: Hi,all.Is it possible to generate mutiple records in one SerDe?

2011-03-21 Thread Ted Yu
I don't think so:
  Object deserialize(Writable blob) throws SerDeException;


On Mon, Mar 21, 2011 at 4:55 AM, 幻  wrote:

> Hi,all.Is it possible to generate mutiple records in one SerDe? I mean if I
> can return more than one rows in deserialize?
>
> Thanks!
>


Re: skew join optimization

2011-03-20 Thread Ted Yu
How about link to http://imageshack.us/ or TinyPic ?

Thanks

On Sun, Mar 20, 2011 at 7:56 AM, Edward Capriolo wrote:

> On Sun, Mar 20, 2011 at 10:30 AM, Ted Yu  wrote:
> > Can someone re-attach the missing figures for that wiki ?
> >
> > Thanks
> >
> > On Sun, Mar 20, 2011 at 7:15 AM, bharath vissapragada
> >  wrote:
> >>
> >> Hi Igor,
> >>
> >> See http://wiki.apache.org/hadoop/Hive/JoinOptimization and see the
> >> jira 1642 which automatically converts a normal join into map-join
> >> (Otherwise you can specify the mapjoin hints in the query itself.).
> >> Because your 'S' table is very small , it can be replicated across all
> >> the mappers and the reduce phase can be avoided. This can greatly
> >> reduce the runtime .. (See the results section in the page for
> >> details.).
> >>
> >> Hope this helps.
> >>
> >> Thanks
> >>
> >>
> >> On Sun, Mar 20, 2011 at 6:37 PM, Jov  wrote:
> >> > 2011/3/20 Igor Tatarinov :
> >> >> I have the following join that takes 4.5 hours (with 12 nodes) mostly
> >> >> because of a single reduce task that gets the bulk of the work:
> >> >> SELECT ...
> >> >> FROM T
> >> >> LEFT OUTER JOIN S
> >> >> ON T.timestamp = S.timestamp and T.id = S.id
> >> >> This is a 1:0/1 join so the size of the output is exactly the same as
> >> >> the
> >> >> size of T (500M records). S is actually very small (5K).
> >> >> I've tried:
> >> >> - switching the order of the join conditions
> >> >> - using a different hash function setting (jenkins instead of murmur)
> >> >> - using SET set hive.auto.convert.join = true;
> >> >
> >> > are you sure your query convert to mapjoin? if not,try use explicit
> >> > mapjoin hint.
> >> >
> >> >
> >> >> - using SET hive.optimize.skewjoin = true;
> >> >> but nothing helped :(
> >> >> Anything else I can try?
> >> >> Thanks!
> >> >
> >>
> >>
> >>
> >> --
> >> Regards,
> >> Bharath .V
> >> w:http://research.iiit.ac.in/~bharath.v
> >
> >
>
> The wiki does not allow images, confluence does but we have not moved their
> yet.
>


Re: skew join optimization

2011-03-20 Thread Ted Yu
Can someone re-attach the missing figures for that wiki ?

Thanks

On Sun, Mar 20, 2011 at 7:15 AM, bharath vissapragada <
bharathvissapragada1...@gmail.com> wrote:

> Hi Igor,
>
> See http://wiki.apache.org/hadoop/Hive/JoinOptimization and see the
> jira 1642 which automatically converts a normal join into map-join
> (Otherwise you can specify the mapjoin hints in the query itself.).
> Because your 'S' table is very small , it can be replicated across all
> the mappers and the reduce phase can be avoided. This can greatly
> reduce the runtime .. (See the results section in the page for
> details.).
>
> Hope this helps.
>
> Thanks
>
>
> On Sun, Mar 20, 2011 at 6:37 PM, Jov  wrote:
> > 2011/3/20 Igor Tatarinov :
> >> I have the following join that takes 4.5 hours (with 12 nodes) mostly
> >> because of a single reduce task that gets the bulk of the work:
> >> SELECT ...
> >> FROM T
> >> LEFT OUTER JOIN S
> >> ON T.timestamp = S.timestamp and T.id = S.id
> >> This is a 1:0/1 join so the size of the output is exactly the same as
> the
> >> size of T (500M records). S is actually very small (5K).
> >> I've tried:
> >> - switching the order of the join conditions
> >> - using a different hash function setting (jenkins instead of murmur)
> >> - using SET set hive.auto.convert.join = true;
> >
> > are you sure your query convert to mapjoin? if not,try use explicit
> > mapjoin hint.
> >
> >
> >> - using SET hive.optimize.skewjoin = true;
> >> but nothing helped :(
> >> Anything else I can try?
> >> Thanks!
> >
>
>
>
> --
> Regards,
> Bharath .V
> w:http://research.iiit.ac.in/~bharath.v
>


Re: does hive support Sequence File format ?

2011-02-17 Thread Ted Yu
Look under
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Create_Table

On Thu, Feb 17, 2011 at 12:00 PM, Mapred Learn wrote:

> Hi,
> I was wondering if hive supports Sequence File format. If yes, could me
> point me to some documentation about how to use Seq files in hive.
>
> Thanks,
> -JJ
>


Re: Partitioning External table

2010-12-29 Thread Ted Yu
Can you try using:
location 'dt=1/engine'

Cheers

On Wed, Dec 29, 2010 at 1:12 AM, David Ginzburg  wrote:

>  Hi,
> Thank you for the reply.
> I tried  ALTER TABLE tpartitions ADD PARTITION (dt='1') LOCATION
> '/user/training/partitions/';
> SHOW PARTITIONS
> tpartitions;
>
> OK
> dt=1
>
>
>  but when I try to issue a select query , I get the following error:
>
> *hive>select count(value) from tpartitions where dt='1';
> Total MapReduce jobs = 1
> Number of reduce tasks not specified. Estimated from input data size: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=
> Job Submission failed with exception 'java.io.FileNotFoundException(File
> does not exist: hdfs://localhost:8022/user/training/partitions/dt=1/data)'
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.ExecDriver
>
> *Why is it looking for data file when my sequence file is located at
> /user/training/partitions/dt=1/engine, according to the partition
>
>
>
>
>
>
> > Date: Tue, 28 Dec 2010 11:25:50 -0500
> > Subject: Re: Partitioning External table
> > From: edlinuxg...@gmail.com
> > To: user@hive.apache.org
>
> >
> > On Tue, Dec 28, 2010 at 9:41 AM, David Ginzburg 
> wrote:
> > > Hi,
> > > I am trying to test  creation of  an external table using partitions,
> > > my files on hdfs are:
> > >
> > > /user/training/partitions/dt=2/engine
> > > /user/training/partitions/dt=2/engine
> > >
> > > engine are sequence files which I have managed to create externally
> and
> > > query from, when I have not used partitions.
> > >
> > > When I create with partitions using :
> > > hive> CREATE EXTERNAL TABLE tpartitions(value STRING) PARTITIONED BY
> (dt
> > > STRING) STORED AS INPUTFORMAT
> > > 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat' OUTPUTFORMAT
> > > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION
> > > '/user/training/partitions';
> > > OK
> > > Time taken: 0.067 seconds
> > >
> > > show partitions
> > > tpartitions;
> > > OK
> > > Time taken: 0.084 seconds
> > > hive> select * from tpartitions;
> > > OK
> > > Time taken: 0.139 seconds
> > >
> > > Can someone point to what am I doing wrong here?
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> > You need to explicitly add the partitions to the table. The location
> > specified for the partition will be appended to the location of the
> > table.
> >
> > http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Add_Partitions
> >
> > Something like this:
> > alter table tpartitions add partition dt=2 location 'dt=2/engine';
> > alter table tpartitions add partition dt=3 location 'dt=3/engine';
>


Re: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

2010-12-20 Thread Ted Yu
Have you found anything interesting from Hive history file
(/tmp/hadoop/hive_job_log_hadoop_201012210353_775358406.txt) ?

Thanks

On Mon, Dec 20, 2010 at 6:11 PM, Sean Curtis  wrote:

> just running a simple select count(1) from a table (using movielens as an
> example) doesnt seem to work for me.  anyone know why this doesnt work? im
> using hive trunk:
>
> hive> select avg(rating) from movierating where movieid=43;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>  set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>  set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>  set mapred.reduce.tasks=
> Starting Job = job_201012141048_0023, Tracking URL =
> http://localhost:50030/jobdetails.jsp?jobid=job_201012141048_0023
> Kill Command = /Users/Sean/dev/hadoop-0.20.2+737/bin/../bin/hadoop job
>  -Dmapred.job.tracker=localhost:8021 -kill job_201012141048_0023
> 2010-12-20 15:15:03,295 Stage-1 map = 0%,  reduce = 0%
> 2010-12-20 15:15:09,420 Stage-1 map = 50%,  reduce = 0%
> ...
> eventually fails after a couple of minutes with:
>
> 2010-12-20 17:33:01,113 Stage-1 map = 100%,  reduce = 0%
> 2010-12-20 17:33:32,182 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_201012141048_0023 with errors
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
> hive>
>
>
> almost seems like the reduce task never starts. any help would be
> appreciated.
>
> sean


Re: Hive produces very small files despite hive.merge...=true settings

2010-11-18 Thread Ted Yu
Leo:
You may find this helpful:
http://indoos.wordpress.com/2010/06/24/hive-remote-debugging/

On Thu, Nov 18, 2010 at 2:57 PM, Leo Alekseyev  wrote:

> Hi Ning,
> For the dataset I'm experimenting with, the total size of the output
> is 2mb, and the files are at most a few kb in size.  My
> hive.input.format was set to default HiveInputFormat; however, when I
> set it to CombineHiveInputFormat, it only made the first stage of the
> job use fewer mappers.  The merge job was *still* filtered out at
> runtime.  I also tried set hive.mergejob.maponly=false; that didn't
> have any effect.
>
> I am a bit at a loss what to do here.  Is there a way to see what's
> going on exactly using e.g. debug log levels?..  Btw, I'm also using
> dynamic partitions; could that somehow be interfering with the merge
> job?..
>
> I'm running a relatively fresh Hive from trunk (built maybe a month ago).
>
> --Leo
>
> On Thu, Nov 18, 2010 at 1:12 PM, Ning Zhang  wrote:
> > The settings looks good. The parameter hive.merge.size.smallfiles.avgsize
> is used to determine at run time if a merge should be triggered: if the
> average size of the files in the partition is SMALLER than the parameter and
> there are more than 1 file, the merge should be scheduled. Can you try to
> see if you have any big files as well in your resulting partition? If it is
> because of a very large file, you can set the parameter large enough.
> >
> > Another possibility is that your Hadoop installation does not support
> CombineHiveInputFormat, which is used for the new merge job. Someone
> reported previously merge was not successful because of this. If that's the
> case, you can turn off CombineHiveInputFormat and use the old
> HiveInputFormat (though slower) by setting hive.mergejob.maponly=false.
> >
> > Ning
> > On Nov 17, 2010, at 6:00 PM, Leo Alekseyev wrote:
> >
> >> I have jobs that sample (or generate) a small amount of data from a
> >> large table.  At the end, I get e.g. about 3000 or more files of 1kb
> >> or so.  This becomes a nuisance.  How can I make Hive do another pass
> >> to merge the output?  I have the following settings:
> >>
> >> hive.merge.mapfiles=true
> >> hive.merge.mapredfiles=true
> >> hive.merge.size.per.task=25600
> >> hive.merge.size.smallfiles.avgsize=1600
> >>
> >> After setting hive.merge* to true, Hive started indicating "Total
> >> MapReduce jobs = 2".  However, after generating the
> >> lots-of-small-files table, Hive says:
> >> Ended Job = job_201011021934_1344
> >> Ended Job = 781771542, job is filtered out (removed at runtime).
> >>
> >> Is there a way to force the merge, or am I missing something?
> >> --Leo
> >
> >
>


Re: EXTERNAL:Re: unable to create table

2010-11-16 Thread Ted Yu
The class is in antlr-runtime-3.0.1.jar
Try finding it under 

On Tue, Nov 16, 2010 at 11:43 AM, Gerlach, Hannah L (IS) <
hannah.gerl...@ngc.com> wrote:

> Dear Ted,
>
>
>
> Maybe  I am missing something, but ‘Exception in hive startup’ appears to
> be a different problem.
>
>
>
> When I run /bin/hive, it starts fine.  The problem arises when
> I try to create a table once hive is running.
>
>
>
> Best,
>
> Hannah
>
>
>
> *From:* Ted Yu [mailto:yuzhih...@gmail.com]
> *Sent:* Tuesday, November 16, 2010 2:16 PM
> *To:* user@hive.apache.org
> *Subject:* EXTERNAL:Re: unable to create table
>
>
>
> See 'Exception in hive startup' discussion - especially Edward's response
> on Oct 13th.
>
> On Tue, Nov 16, 2010 at 9:40 AM, Gerlach, Hannah L (IS) <
> hannah.gerl...@ngc.com> wrote:
>
> Hello,
>
>
>
> I am completely new to hive and I need some help.  Hive starts fine, but
> when I try to create a table, I get an error.  See below.
>
>
>
> hive> CREATE TABLE pokes (foo INT, bar STRING);
>
>
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/antlr/runtime/tree/TreeAdaptor
>
> at
> org.apache.hadoop.hive.ql.Driver.compile(Driver.java:314)
>
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:635)
>
> at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140)
>
> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:199)
>
> at
> org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:353)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
> at java.lang.reflect.Method.invoke(Method.java:597)
>
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> Caused by: java.lang.ClassNotFoundException:
> org.antlr.runtime.tree.TreeAdaptor
>
> at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>
> at java.security.AccessController.doPrivileged(Native
> Method)
>
> at
> java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>
> at java.lang.ClassLoader.loadClass(ClassLoader.java:303)
>
> at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>
> at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>
> at
> java.lang.ClassLoader.loadClassInternal(ClassLoader.java:316)
>
> ... 10 more
>
>
>
> Ideas?
>
>
>
> Best,
>
> Hannah
>
>
>


Re: unable to create table

2010-11-16 Thread Ted Yu
See 'Exception in hive startup' discussion - especially Edward's response on
Oct 13th.

On Tue, Nov 16, 2010 at 9:40 AM, Gerlach, Hannah L (IS) <
hannah.gerl...@ngc.com> wrote:

> Hello,
>
>
>
> I am completely new to hive and I need some help.  Hive starts fine, but
> when I try to create a table, I get an error.  See below.
>
>
>
> hive> CREATE TABLE pokes (foo INT, bar STRING);
>
>
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/antlr/runtime/tree/TreeAdaptor
>
> at
> org.apache.hadoop.hive.ql.Driver.compile(Driver.java:314)
>
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:635)
>
> at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:140)
>
> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:199)
>
> at
> org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:353)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
> at java.lang.reflect.Method.invoke(Method.java:597)
>
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> Caused by: java.lang.ClassNotFoundException:
> org.antlr.runtime.tree.TreeAdaptor
>
> at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>
> at java.security.AccessController.doPrivileged(Native
> Method)
>
> at
> java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>
> at java.lang.ClassLoader.loadClass(ClassLoader.java:303)
>
> at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>
> at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>
> at
> java.lang.ClassLoader.loadClassInternal(ClassLoader.java:316)
>
> ... 10 more
>
>
>
> Ideas?
>
>
>
> Best,
>
> Hannah
>


Re: Hive Getting Started Wiki assumes $CLASSPATH at end of HADOOP_CLASSPATH

2010-11-08 Thread Ted Yu
Please see Edward's reply to 'Exception in hive startup' on Oct 13th.
Try running with /bin/hive

On Mon, Nov 8, 2010 at 7:02 PM, Stuart Smith  wrote:

> Hello,
>
>   I'm just starting with hive, and I ran into a newbie problem that didn't
> have a solution via google. So I thought I'd record the solution for
> posterity (and other hapless newbies) :)
>
> I've been using hadoop/hbase for a while, and have configured hadoop-env.sh
> a bit here and there (to work with hbase, etc). At some point, I dropped the
> $CLASSPATH off the end of the standard line:
>
> export
> HADOOP_CLASSPATH=/home/stu/hbase/hbase-0.20.6.jar:/home/stu/hbase/hbase-0.20.6-test.jar:/home/stu/hbase/conf:/home/stu/hbase/lib/zookeeper-3.2.2.jar:$CLASSPATH
>
> So it became:
>
> # Extra Java CLASSPATH elements.  Optional.
> export
> HADOOP_CLASSPATH=/home/stu/hbase/hbase-0.20.6.jar:/home/stu/hbase/hbase-0.20.6-test.jar:/home/stu/hbase/conf:/home/stu/hbase/lib/zookeeper-3.2.2.jar
>
> (probably when I added the hbase stuff or something). My hadoop/hbase set
> up runs fine, so I never noticed.
>
> Well, if you do that, and you try to run the hive shell, you get the:
>
> s...@ubuntu-update:~/hive-0.6.0/bin/ext$ /home/stu/hadoop-0.20.2/bin/hadoop
> jar /home/stu/hive-0.6.0/lib/hive-cli-0.6.0.jar
> org.apache.hadoop.hive.cli.CliDriver
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/hadoop/hive/conf/HiveConf
>at java.lang.Class.forName0(Native Method)
>at java.lang.Class.forName(Class.java:247)
>at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.conf.HiveConf
>at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>at java.security.AccessController.doPrivileged(Native Method)
>at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>... 3 more
>
> error, even if you've followed the wiki correctly and set HADOOP_HOME and
> HIVE_HOME correctly. Note the command line above is a little strange,
> because I was debugging through the $HIVE_HOME/bin/hive script... (So I
> printed out the classpath it was forming, set it by hand, ran the
> instructions by hand, etc).
>
> This is installing from the hive tar (stable). But that doesn't matter.
>
> Anyways, hope the answer helps someone..
>
> Best,
>  -stu
>
>
>
>
>
>


Re: Exception in hive startup

2010-10-13 Thread Ted Yu
This should be documented in README.txt

On Wed, Oct 13, 2010 at 6:14 PM, Steven Wong  wrote:

>  You need to run hive_root/build/dist/bin/hive, not hive_root/bin/hive.
>
>
>
>
>
> *From:* hdev ml [mailto:hde...@gmail.com]
> *Sent:* Wednesday, October 13, 2010 2:18 PM
> *To:* hive-u...@hadoop.apache.org
> *Subject:* Exception in hive startup
>
>
>
> Hi all,
>
> I installed Hadoop 0.20.2 and installed hive 0.5.0.
>
> I followed all the instructions on Hive's getting started page for setting
> up environment variables like HADOOP_HOME
>
> When I run from command prompt in the hive installation folder as
> "bin/hive" it gives me following exception
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/hadoop/hive/conf/HiveConf
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:247)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.conf.HiveConf
> at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
> at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
> ... 3 more
>
> Please note that my Hadoop installation is working fine.
>
> What could be the cause of this? Anybody has any idea?
>
> Thanks
> Harshad
>


Re: Exception in hive startup

2010-10-13 Thread Ted Yu
Make sure hive-exec-0.7.0.jar is under lib/ dir.

Found: HiveConf
Class: org.apache.hadoop.hive.conf.HiveConf
Package: org.apache.hadoop.hive.conf
Library Name: hive-exec-0.7.0.jar
Library Path: /Users/tyu/hive/lib/hive-exec-0.7.0.jar

On Wed, Oct 13, 2010 at 2:17 PM, hdev ml  wrote:

> Hi all,
>
> I installed Hadoop 0.20.2 and installed hive 0.5.0.
>
> I followed all the instructions on Hive's getting started page for setting
> up environment variables like HADOOP_HOME
>
> When I run from command prompt in the hive installation folder as
> "bin/hive" it gives me following exception
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/hadoop/hive/conf/HiveConf
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:247)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.conf.HiveConf
> at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
> at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
> ... 3 more
>
> Please note that my Hadoop installation is working fine.
>
> What could be the cause of this? Anybody has any idea?
>
> Thanks
> Harshad
>


LzoCodec for hadoop 0.20.x

2010-10-08 Thread Ted Yu
Hi,
I am using hive trunk with default setting along with cdh3b2.
I encounter the following exception:

Caused by: java.lang.IllegalArgumentException: Compression codec
com.hadoop.compression.lzo.LzoCodec not found.
at
org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96)
at
org.apache.hadoop.io.compress.CompressionCodecFactory.(CompressionCodecFactory.java:134)
at
org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:41)
... 27 more
Caused by: java.lang.ClassNotFoundException:
com.hadoop.compression.lzo.LzoCodec
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)

I found the following source files:
./build/hadoopcore/hadoop-0.17.2.1/src/java/org/apache/hadoop/io/compress/LzoCodec.java
./build/hadoopcore/hadoop-0.18.3/src/core/org/apache/hadoop/io/compress/LzoCodec.java
./build/hadoopcore/hadoop-0.19.0/src/core/org/apache/hadoop/io/compress/LzoCodec.java

My question is where is the LzoCodec.java for hadoop 0.20.x ?

Thanks


exception in admin_list_jobs.jsp

2010-10-07 Thread Ted Yu
Hi,
I compiled hive from trunk.
ciq.com:/hwi/admin_list_jobs.jsp gave me:

org.apache.jasper.JasperException: Unable to compile class for JSP

An error occurred at line: 24 in the jsp file: /admin_list_jobs.jsp
Generated servlet error:
The field ExecDriver.runningJobKillURIs is not visible

Has anyone seen this ?