Re: binary file deserialization
bq. there is a varying number of items for that record If the combination of items is very large, using case class would be tedious. On Wed, Mar 9, 2016 at 9:57 AM, Saurabh Bajajwrote: > You can load that binary up as a String RDD, then map over that RDD and > convert each row to your case class representing the data. In the map stage > you could also map the input string into an RDD of JSON values and use the > following function to convert it into a DF > > http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets > > val anotherPeople = sqlContext.read.json(anotherPeopleRDD) > > > On Wed, Mar 9, 2016 at 9:15 AM, Ruslan Dautkhanov > wrote: > >> We have a huge binary file in a custom serialization format (e.g. header >> tells the length of the record, then there is a varying number of items for >> that record). This is produced by an old c++ application. >> What would be best approach to deserialize it into a Hive table or a >> Spark RDD? >> Format is known and well documented. >> >> >> -- >> Ruslan Dautkhanov >> > >
Re: error while defining custom schema in Spark 1.5.0
The error was due to blank field being defined twice. On Tue, Dec 22, 2015 at 12:03 AM, Divya Gehlotwrote: > Hi, > I am new bee to Apache Spark ,using CDH 5.5 Quick start VM.having spark > 1.5.0. > I working on custom schema and getting error > > import org.apache.spark.sql.hive.HiveContext >>> >>> scala> import org.apache.spark.sql.hive.orc._ >>> import org.apache.spark.sql.hive.orc._ >>> >>> scala> import org.apache.spark.sql.types.{StructType, StructField, >>> StringType, IntegerType}; >>> import org.apache.spark.sql.types.{StructType, StructField, StringType, >>> IntegerType} >>> >>> scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) >>> 15/12/21 23:41:53 INFO hive.HiveContext: Initializing execution hive, >>> version 1.1.0 >>> 15/12/21 23:41:53 INFO client.ClientWrapper: Inspected Hadoop version: >>> 2.6.0-cdh5.5.0 >>> 15/12/21 23:41:53 INFO client.ClientWrapper: Loaded >>> org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0-cdh5.5.0 >>> hiveContext: org.apache.spark.sql.hive.HiveContext = >>> org.apache.spark.sql.hive.HiveContext@214bd538 >>> >>> scala> val customSchema = StructType(Seq(StructField("year", >>> IntegerType, true),StructField("make", StringType, >>> true),StructField("model", StringType, true),StructField("comment", >>> StringType, true),StructField("blank", StringType, true))) >>> customSchema: org.apache.spark.sql.types.StructType = >>> StructType(StructField(year,IntegerType,true), >>> StructField(make,StringType,true), StructField(model,StringType,true), >>> StructField(comment,StringType,true), StructField(blank,StringType,true)) >>> >>> scala> val customSchema = (new StructType).add("year", IntegerType, >>> true).add("make", StringType, true).add("model", StringType, >>> true).add("comment", StringType, true).add("blank", StringType, true) >>> customSchema: org.apache.spark.sql.types.StructType = >>> StructType(StructField(year,IntegerType,true), >>> StructField(make,StringType,true), StructField(model,StringType,true), >>> StructField(comment,StringType,true), StructField(blank,StringType,true)) >>> >>> scala> val customSchema = StructType( StructField("year", IntegerType, >>> true) :: StructField("make", StringType, true) :: StructField("model", >>> StringType, true) :: StructField("comment", StringType, true) :: >>> StructField("blank", StringType, true)::StructField("blank", StringType, >>> true)) >>> :24: error: value :: is not a member of >>> org.apache.spark.sql.types.StructField >>>val customSchema = StructType( StructField("year", IntegerType, >>> true) :: StructField("make", StringType, true) :: StructField("model", >>> StringType, true) :: StructField("comment", StringType, true) :: >>> StructField("blank", StringType, true)::StructField("blank", StringType, >>> true)) >>> >> > Tried like like below also > > scala> val customSchema = StructType( StructField("year", IntegerType, > true), StructField("make", StringType, true) ,StructField("model", > StringType, true) , StructField("comment", StringType, true) , > StructField("blank", StringType, true),StructField("blank", StringType, > true)) > :24: error: overloaded method value apply with alternatives: > (fields: > Array[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType > > (fields: > java.util.List[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType > > (fields: > Seq[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType > cannot be applied to (org.apache.spark.sql.types.StructField, > org.apache.spark.sql.types.StructField, > org.apache.spark.sql.types.StructField, > org.apache.spark.sql.types.StructField, > org.apache.spark.sql.types.StructField, > org.apache.spark.sql.types.StructField) >val customSchema = StructType( StructField("year", IntegerType, > true), StructField("make", StringType, true) ,StructField("model", > StringType, true) , StructField("comment", StringType, true) , > StructField("blank", StringType, true),StructField("blank", StringType, > true)) > ^ >Would really appreciate if somebody share the example which works with > Spark 1.4 or Spark 1.5.0 > > Thanks, > Divya > > ^ >
Re: Intermittent BindException during long MR jobs
Krishna: Please take a look at: http://wiki.apache.org/hadoop/BindException Cheers On Thu, Feb 26, 2015 at 10:30 PM, hadoop.supp...@visolve.com wrote: Hello Krishna, Exception seems to be IP specific. It might be occurred due to unavailability of IP address in the system to assign. Double check the IP address availability and run the job. *Thanks,* *S.RagavendraGanesh* ViSolve Hadoop Support Team ViSolve Inc. | San Jose, California Website: www.visolve.com email: servi...@visolve.com | Phone: 408-850-2243 *From:* Krishna Rao [mailto:krishnanj...@gmail.com] *Sent:* Thursday, February 26, 2015 9:48 PM *To:* user@hive.apache.org; u...@hadoop.apache.org *Subject:* Intermittent BindException during long MR jobs Hi, we occasionally run into a BindException causing long running jobs to occasionally fail. The stacktrace is below. Any ideas what this could be caused by? Cheers, Krishna Stacktrace: 379969 [Thread-980] ERROR org.apache.hadoop.hive.ql.exec.Task - Job Submission failed with exception 'java.net.BindException(Problem binding to [back10/10.4.2.10:0] java.net.BindException: Cann ot assign requested address; For more details see: http://wiki.apache.org/hadoop/BindException)' java.net.BindException: Problem binding to [back10/10.4.2.10:0] java.net.BindException: Cannot assign requested address; For more details see: http://wiki.apache.org/hadoop/BindException at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:718) at org.apache.hadoop.ipc.Client.call(Client.java:1242) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy10.create(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:193) at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at com.sun.proxy.$Proxy11.create(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream.init(DFSOutputStream.java:1376) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1395) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1255) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1212) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:276) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:265) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:82) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:869) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:768) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:757) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:558) at org.apache.hadoop.mapreduce.split.JobSplitWriter.createFile(JobSplitWriter.java:96) at org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:85) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:517) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:487) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:369) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1286) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1283) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1283) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:448) at
Re: unsubscribe
You have to send a mail to user-unsubscr...@hive.apache.org On Thu, Jul 18, 2013 at 1:30 PM, Beau Rothrock beau.rothr...@lookout.comwrote:
Re: getRegion method is missing from RegionCoprocessorEnvironment class in version 0.94
Which release of 0.94 do you use ? For tip of 0.94, I see: public interface RegionCoprocessorEnvironment extends CoprocessorEnvironment { /** @return the region associated with this coprocessor */ public HRegion getRegion(); On Thu, Jul 11, 2013 at 6:40 PM, ch huang justlo...@gmail.com wrote: how can i do ,to change the example code based on 0.92 api ,so it can running in 0.94?
Re: Variable resolution Fails
Naidu: Please don't hijack existing thread. Your questions are not directly related to Hive. Cheers On May 1, 2013, at 12:53 AM, Naidu MS sanyasinaidu.malla...@gmail.com wrote: Hi i have two questions regarding hdfs and jps utility I am new to Hadoop and started leraning hadoop from the past week 1.when ever i start start-all.sh and jps in console it showing the processes started naidu@naidu:~/work/hadoop-1.0.4/bin$ jps 22283 NameNode 23516 TaskTracker 26711 Jps 22541 DataNode 23255 JobTracker 22813 SecondaryNameNode Could not synchronize with target But along with the list of process stared it always showing Could not synchronize with target in the jps output. What is meant by Could not synchronize with target? Can some one explain why this is happening? 2.Is it possible to format namenode multiple times? When i enter the namenode -format command, it not formatting the name node and showing the following ouput. naidu@naidu:~/work/hadoop-1.0.4/bin$ hadoop namenode -format Warning: $HADOOP_HOME is deprecated. 13/05/01 12:08:04 INFO namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = naidu/127.0.0.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.0.4 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012 / Re-format filesystem in /home/naidu/dfs/namenode ? (Y or N) y Format aborted in /home/naidu/dfs/namenode 13/05/01 12:08:05 INFO namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at naidu/127.0.0.1 / Can someone help me in understanding this? Why is it not possible to format name node multiple times? On Wed, May 1, 2013 at 8:14 AM, Sanjay Subramanian sanjay.subraman...@wizecommerce.com wrote: +1 agreed Also as a general script programming practice I check if the variables I am going to use are NON empty before using them…nothing related to Hive scripts If [ ${freq} == ] then echo variable freq is empty…exiting exit 1 Fi From: Anthony Urso antho...@cs.ucla.edu Reply-To: user@hive.apache.org user@hive.apache.org Date: Tuesday, April 30, 2013 7:20 PM To: user@hive.apache.org user@hive.apache.org, sumit ghosh sumi...@yahoo.com Subject: Re: Variable resolution Fails Your shell is expanding the variable ${env:freq}, which doesn't exist in the shell's environment, so hive is getting the empty string in that place. If you are always intending to run your query like this, just use ${freq} which will be expanded as expected by bash and then passed to hive. Cheers, Anthony On Tue, Apr 30, 2013 at 4:40 PM, sumit ghosh sumi...@yahoo.com wrote: Hi, The following variable freq fails to resolve: bash-4.1$ export freq=MNTH bash-4.1$ echo $freq MNTH bash-4.1$ hive -e select ${env:freq} as dr from dual Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties Hive history file=/hadoop1/hive_querylog/sumighos/hive_job_log_sumighos_201304302321_1867815625.txt FAILED: ParseException line 1:8 cannot recognize input near 'as' 'dr' 'from' in select clause bash-4.1$ Here dual is a table with 1 row. What am I am doing wrong? When I try to resolve freq - it is empty!! $ hive -S -e select '${env:freq}' as dr from dual $ Thanks, Sumit CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Re: no data in external table
Can you tell us how you created mapping for the existing table ? In task log, do you see any connection attempt to HBase ? Cheers On Thu, Oct 4, 2012 at 11:30 AM, alx...@aim.com wrote: Hello, I use hive-0.9.0 with hadoop-0.20.2 and hbase -0.92.1. I have created external table, mapping it to an existing table in hbase. When I do select * from myextrenaltable it returns no results, although scan in hbase shows data, and I do not see any errors in jobtracker log. Any ideas how to debug this issue. Thanks. Alex.
Re: HBase aggregate query
Hi, Are you able to get the number you want through hive log ? Thanks On Mon, Sep 10, 2012 at 7:03 AM, iwannaplay games funnlearnfork...@gmail.com wrote: Hi , I want to run query like select month(eventdate),scene,count(1),sum(timespent) from eventlog group by month(eventdate),scene in hbase.Through hive its taking a lot of time for 40 million records.Do we have any syntax in hbase to find its result?In sql server it takes around 9 minutes,How long it might take in hbase?? Regards Prabhjot
Re: HBaseSerDe
The ctor is used in TestHBaseSerDe.java So maybe change it to package private ? On Wed, Jul 25, 2012 at 12:43 PM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: While going through some code for HBase/Hive Integration, I came across this constructor: public HBaseSerDe() throws SerDeException { } Basically, the constructor is doing nothing but throwing an exception. Problem is fixing this now will be a non-passive change. I couldn't really find an obvious reason for this to be there. Are there any objections if I file a JIRA to remove this constructor? -- Swarnim
Re: What virtual column does hive.exec.rowoffset add?
From VirtualColumn.java: public static VirtualColumn ROWOFFSET = new VirtualColumn(ROW__OFFSET__INSIDE__BLOCK, (PrimitiveTypeInfo)TypeInfoFactory.longTypeInfo); Cheers On Sun, Mar 25, 2012 at 2:30 PM, Edward Capriolo edlinuxg...@gmail.comwrote: property namehive.exec.rowoffset/name valuefalse/value descriptionWhether to provide the row offset virtual column/description /property I know the others are INPUT__FILE__NAME and BLOCK_OFFSET Does anyone know what this third columns is? Edward
Re: Running Hive Queries in Parallel?
Set it in conf/hive-site.xml On Thu, Jul 7, 2011 at 10:59 PM, hadoop n00b new2h...@gmail.com wrote: Found *hive*.*exec*.*parallel!!!* How can I set to to be 'true' by default? Thanks! On Fri, Jul 8, 2011 at 11:14 AM, hadoop n00b new2h...@gmail.com wrote: Hello, When I execute multiple queries in Hive, the mapred tasks are queued up and executed one by one. Is there are way I could set Hive or Hadoop to execute mapred tasks in parallel? I am running on Hive 0.4.1 and Hadoop 0.20 Tx!
Re: URGENT: I need the Hive Server setup Wiki
wiki has moved. See https://cwiki.apache.org/confluence/display/Hive/AdminManual+SettingUpHiveServer On Mon, Jun 27, 2011 at 2:31 PM, Ayon Sinha ayonsi...@yahoo.com wrote: https://cwiki.apache.org/confluence/display/Hive/AdminManual+SettingUpHiveServeris empty and the old link is gone. -Ayon See My Photos on Flickr http://www.flickr.com/photos/ayonsinha/ Also check out my Blog for answers to commonly asked questions.http://dailyadvisor.blogspot.com
Re: How to change the hive.metastore.warehouse.dir ?
Can you try as user hadoop ? Cheers On May 17, 2011, at 9:53 PM, jinhang du dujinh...@gmail.com wrote: hi, The default value is /user/hive/warehouse in hive.site.xml. After I changed the directory to a path on HDFS, I got the exception. FAILED: Error in metadata: MetaException(message:Got exception: org.apache.hadoop.security. AccessControlException org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode=output FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask Is this failure related to the hadoop-site.xml or something? Thanks for your help. -- dujinhang
Re: Hi,all.Is it possible to generate mutiple records in one SerDe?
I don't think so: Object deserialize(Writable blob) throws SerDeException; On Mon, Mar 21, 2011 at 4:55 AM, 幻 ygnhz...@gmail.com wrote: Hi,all.Is it possible to generate mutiple records in one SerDe? I mean if I can return more than one rows in deserialize? Thanks!
Re: skew join optimization
Can someone re-attach the missing figures for that wiki ? Thanks On Sun, Mar 20, 2011 at 7:15 AM, bharath vissapragada bharathvissapragada1...@gmail.com wrote: Hi Igor, See http://wiki.apache.org/hadoop/Hive/JoinOptimization and see the jira 1642 which automatically converts a normal join into map-join (Otherwise you can specify the mapjoin hints in the query itself.). Because your 'S' table is very small , it can be replicated across all the mappers and the reduce phase can be avoided. This can greatly reduce the runtime .. (See the results section in the page for details.). Hope this helps. Thanks On Sun, Mar 20, 2011 at 6:37 PM, Jov zhao6...@gmail.com wrote: 2011/3/20 Igor Tatarinov i...@decide.com: I have the following join that takes 4.5 hours (with 12 nodes) mostly because of a single reduce task that gets the bulk of the work: SELECT ... FROM T LEFT OUTER JOIN S ON T.timestamp = S.timestamp and T.id = S.id This is a 1:0/1 join so the size of the output is exactly the same as the size of T (500M records). S is actually very small (5K). I've tried: - switching the order of the join conditions - using a different hash function setting (jenkins instead of murmur) - using SET set hive.auto.convert.join = true; are you sure your query convert to mapjoin? if not,try use explicit mapjoin hint. - using SET hive.optimize.skewjoin = true; but nothing helped :( Anything else I can try? Thanks! -- Regards, Bharath .V w:http://research.iiit.ac.in/~bharath.v
Re: skew join optimization
How about link to http://imageshack.us/ or TinyPic ? Thanks On Sun, Mar 20, 2011 at 7:56 AM, Edward Capriolo edlinuxg...@gmail.comwrote: On Sun, Mar 20, 2011 at 10:30 AM, Ted Yu yuzhih...@gmail.com wrote: Can someone re-attach the missing figures for that wiki ? Thanks On Sun, Mar 20, 2011 at 7:15 AM, bharath vissapragada bharathvissapragada1...@gmail.com wrote: Hi Igor, See http://wiki.apache.org/hadoop/Hive/JoinOptimization and see the jira 1642 which automatically converts a normal join into map-join (Otherwise you can specify the mapjoin hints in the query itself.). Because your 'S' table is very small , it can be replicated across all the mappers and the reduce phase can be avoided. This can greatly reduce the runtime .. (See the results section in the page for details.). Hope this helps. Thanks On Sun, Mar 20, 2011 at 6:37 PM, Jov zhao6...@gmail.com wrote: 2011/3/20 Igor Tatarinov i...@decide.com: I have the following join that takes 4.5 hours (with 12 nodes) mostly because of a single reduce task that gets the bulk of the work: SELECT ... FROM T LEFT OUTER JOIN S ON T.timestamp = S.timestamp and T.id = S.id This is a 1:0/1 join so the size of the output is exactly the same as the size of T (500M records). S is actually very small (5K). I've tried: - switching the order of the join conditions - using a different hash function setting (jenkins instead of murmur) - using SET set hive.auto.convert.join = true; are you sure your query convert to mapjoin? if not,try use explicit mapjoin hint. - using SET hive.optimize.skewjoin = true; but nothing helped :( Anything else I can try? Thanks! -- Regards, Bharath .V w:http://research.iiit.ac.in/~bharath.v The wiki does not allow images, confluence does but we have not moved their yet.
Re: does hive support Sequence File format ?
Look under http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Create_Table On Thu, Feb 17, 2011 at 12:00 PM, Mapred Learn mapred.le...@gmail.comwrote: Hi, I was wondering if hive supports Sequence File format. If yes, could me point me to some documentation about how to use Seq files in hive. Thanks, -JJ
Re: Partitioning External table
Can you try using: location 'dt=1/engine' Cheers On Wed, Dec 29, 2010 at 1:12 AM, David Ginzburg ginz...@hotmail.com wrote: Hi, Thank you for the reply. I tried ALTER TABLE tpartitions ADD PARTITION (dt='1') LOCATION '/user/training/partitions/'; SHOW PARTITIONS tpartitions; OK dt=1 but when I try to issue a select query , I get the following error: *hiveselect count(value) from tpartitions where dt='1'; Total MapReduce jobs = 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: hdfs://localhost:8022/user/training/partitions/dt=1/data)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ExecDriver *Why is it looking for data file when my sequence file is located at /user/training/partitions/dt=1/engine, according to the partition Date: Tue, 28 Dec 2010 11:25:50 -0500 Subject: Re: Partitioning External table From: edlinuxg...@gmail.com To: user@hive.apache.org On Tue, Dec 28, 2010 at 9:41 AM, David Ginzburg ginz...@hotmail.com wrote: Hi, I am trying to test creation of an external table using partitions, my files on hdfs are: /user/training/partitions/dt=2/engine /user/training/partitions/dt=2/engine engine are sequence files which I have managed to create externally and query from, when I have not used partitions. When I create with partitions using : hive CREATE EXTERNAL TABLE tpartitions(value STRING) PARTITIONED BY (dt STRING) STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/user/training/partitions'; OK Time taken: 0.067 seconds show partitions tpartitions; OK Time taken: 0.084 seconds hive select * from tpartitions; OK Time taken: 0.139 seconds Can someone point to what am I doing wrong here? You need to explicitly add the partitions to the table. The location specified for the partition will be appended to the location of the table. http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Add_Partitions Something like this: alter table tpartitions add partition dt=2 location 'dt=2/engine'; alter table tpartitions add partition dt=3 location 'dt=3/engine';
Re: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
Have you found anything interesting from Hive history file (/tmp/hadoop/hive_job_log_hadoop_201012210353_775358406.txt) ? Thanks On Mon, Dec 20, 2010 at 6:11 PM, Sean Curtis sean.cur...@gmail.com wrote: just running a simple select count(1) from a table (using movielens as an example) doesnt seem to work for me. anyone know why this doesnt work? im using hive trunk: hive select avg(rating) from movierating where movieid=43; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201012141048_0023, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201012141048_0023 Kill Command = /Users/Sean/dev/hadoop-0.20.2+737/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:8021 -kill job_201012141048_0023 2010-12-20 15:15:03,295 Stage-1 map = 0%, reduce = 0% 2010-12-20 15:15:09,420 Stage-1 map = 50%, reduce = 0% ... eventually fails after a couple of minutes with: 2010-12-20 17:33:01,113 Stage-1 map = 100%, reduce = 0% 2010-12-20 17:33:32,182 Stage-1 map = 100%, reduce = 100% Ended Job = job_201012141048_0023 with errors FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask hive almost seems like the reduce task never starts. any help would be appreciated. sean
Re: Exception in hive startup
This should be documented in README.txt On Wed, Oct 13, 2010 at 6:14 PM, Steven Wong sw...@netflix.com wrote: You need to run hive_root/build/dist/bin/hive, not hive_root/bin/hive. *From:* hdev ml [mailto:hde...@gmail.com] *Sent:* Wednesday, October 13, 2010 2:18 PM *To:* hive-u...@hadoop.apache.org *Subject:* Exception in hive startup Hi all, I installed Hadoop 0.20.2 and installed hive 0.5.0. I followed all the instructions on Hive's getting started page for setting up environment variables like HADOOP_HOME When I run from command prompt in the hive installation folder as bin/hive it gives me following exception Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.util.RunJar.main(RunJar.java:149) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) ... 3 more Please note that my Hadoop installation is working fine. What could be the cause of this? Anybody has any idea? Thanks Harshad