Re: Which SerDe for Custom Binary Data.
Currently we have data in NFS and we have proprietery tools to access the data. We are planning to move the data into HDFS and use HiveQL for accessing the data and run batch jobs. So looking out for the custom SerDe(assuming the existing SerDe's will not be able to read the underlying data) to read the data using hive. On Fri, Mar 13, 2015 at 10:33 AM, Mich Talebzadeh m...@peridale.co.uk wrote: Hive as I use it is particularly useful for getting data out of relational tables and more importantly query that data using HiveQL (a variation of transact sql) . If your data is in binary format and assuming that you manage to store it in HDFS, how are you intending to access the data. At the consumer level what tools are you going to use? Do you a propriety tool with the correct drivers to access the data? HTH Mich Talebzadeh http://talebzadehmich.wordpress.com *Publications due shortly:* *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and Coherence Cache* NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility. *From:* karthik maddala [mailto:karthikmaddal...@gmail.com] *Sent:* 13 March 2015 15:56 *To:* user@hive.apache.org *Subject:* Which SerDe for Custom Binary Data. I want to set up a DW based on Hive. However, my data does not come as handy csv files but as binary files in a proprietary format. The binary file consists of serialized data using C language. Could you please suggest which input format to be used and how to write a custom SerDe for the above mentioned data. Thanks, Karthik Maddala
Re: drop table command hang
It doesn't matter whether I truncate the table, it always hangs there. Very werid. On Wed, Mar 11, 2015 at 3:06 PM, Mich Talebzadeh m...@peridale.co.uk wrote: Have you truncated the table before dropping it? I Truncate table table_name Drop table rable_name Mich Talebzadeh http://talebzadehmich.wordpress.com *Publications due shortly:* *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and Coherence Cache* NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility. *From:* Jeff Zhang [mailto:zjf...@gmail.com] *Sent:* 11 March 2015 06:56 *To:* user@hive.apache.org *Subject:* drop table command hang I invoke a drop table command and it hangs there. Here's the log. I am using mysql and I can invoke describe command and create table through mysql console, so I assume mysql works properly. Can anyone help this ? Thanks 2015-03-11 14:48:09,441 INFO [main]: ql.Driver (Driver.java:checkConcurrency(161)) - Concurrency mode is disabled, not creating a lock manager 2015-03-11 14:48:09,441 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver 2015-03-11 14:48:09,441 INFO [main]: ql.Driver (Driver.java:execute(1321)) - Starting command: drop table student_bucketed_s1 2015-03-11 14:48:09,441 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - /PERFLOG method=TimeToSubmit start=1426056489421 end=1426056489441 duration=20 from=org.apache.hadoop.hive.ql.Driver 2015-03-11 14:48:09,442 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver 2015-03-11 14:48:09,442 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver 2015-03-11 14:48:09,442 INFO [main]: ql.Driver (Driver.java:launchTask(1640)) - Starting task [Stage-0:DDL] in serial mode 2015-03-11 14:48:09,442 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(743)) - 0: get_table : db=default tbl=student_bucketed_s1 2015-03-11 14:48:09,443 INFO [main]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(368)) - ugi=jzhang ip=unknown-ip-addr cmd=get_table : db=default tbl=student_bucketed_s1 2015-03-11 14:48:09,458 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(743)) - 0: get_table : db=default tbl=student_bucketed_s1 2015-03-11 14:48:09,458 INFO [main]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(368)) - ugi=jzhang ip=unknown-ip-addr cmd=get_table : db=default tbl=student_bucketed_s1 2015-03-11 14:48:09,474 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(743)) - 0: drop_table : db=default tbl=student_bucketed_s1 2015-03-11 14:48:09,474 INFO [main]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(368)) - ugi=jzhang ip=unknown-ip-addr cmd=drop_table : db=default tbl=student_bucketed_s1 -- Best Regards Jeff Zhang -- Best Regards Jeff Zhang
Re: when start hive could not generate log file
I don't find any log. I updated the log dir in hive-log4j.properties ,but it also didn't work. -- Original -- From: Jianfeng (Jeff) Zhang;jzh...@hortonworks.com; Date: Fri, Mar 13, 2015 08:50 AM To: user@hive.apache.orguser@hive.apache.org; Subject: Re: when start hive could not generate log file By default, hive.log is located in /tmp/${user}/hive.log Best Regard, Jeff Zhang From: zhangjp smart...@hotmail.com Reply-To: user@hive.apache.org user@hive.apache.org Date: Wednesday, March 11, 2015 at 7:12 PM To: user@hive.apache.org user@hive.apache.org Subject: when start hive could not generate log file when i run command hive , the message as follows [@xxx/]# hive log4j:WARN No appenders could be found for logger (org.apache.hadoop.hive.common.LogUtils). log4j:WARN Please initialize the log4j system properly. Logging initialized using configuration in file:/search/apache-hive-0.13.1-bin/conf/hive-log4j.properties? My hive-log4j.properties use the default template. but when I run find -name hive.log couldn't find any file.?
errors when select * from table limit 10 througt jdbc client
When i query with select col1 from table limit 10 it's ok ,but when i replace col1 to * it throws errors . Caused by: org.apache.thrift.TApplicationException: Internal error processing FetchResults at org.apache.thrift.TApplicationException.read(TApplicationException.java:108) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71) at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_FetchResults(TCLIService.java:501) at org.apache.hive.service.cli.thrift.TCLIService$Client.FetchResults(TCLIService.java:488) at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:329) ... 74 more
Applying UDFs on load.
Hi all, I have been loading my data into Hive as Strings, and then using the SELECT INTO statement to apply UDFs to transform my data. I was just wondering if there is a better way to do this - perhaps a way to apply UDFs on a LOAD DATA call? Or something involving the new temporary tables feature? Thanks!
Re: insert table error
What is the error you get? Daniel On 13 במרץ 2015, at 13:13, zhangjp smart...@hotmail.com wrote: case fail CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2)) CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC; INSERT INTO TABLE students VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);
Hive on Spark
Hi all, Recently i have configured Spark 1.2.0 and my environment is hadoop 2.6.0 hive 1.1.0 Here i have tried hive on Spark while executing insert into i am getting the following g error. Query ID = hadoop2_20150313162828_8764adad-a8e4-49da-9ef5-35e4ebd6bc63 Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapreduce.job.reduces=number Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask Have added the spark-assembly jar in hive lib And also in hive console using the command add jar followed by the steps set spark.home=/opt/spark-1.2.1/; add jar /opt/spark-1.2.1/assembly/target/scala-2.10/spark-assembly-1.2.1-hadoop2.4.0.jar; set hive.execution.engine=spark; set spark.master=spark://xxx:7077; set spark.eventLog.enabled=true; set spark.executor.memory=512m; set spark.serializer=org.apache.spark.serializer.KryoSerializer; Can anyone suggest Thanks Regards Amithsha
Re: insert table error
What version of Hive are you using. INSERT INTO ... VALUES is supported only from Hive 0.14 onwards. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingvaluesintotablesfromSQL -- Thanks, Raunak Jhawar m: 09820890034 On Fri, Mar 13, 2015 at 4:45 PM, Daniel Haviv daniel.ha...@veracity-group.com wrote: What is the error you get? Daniel On 13 במרץ 2015, at 13:13, zhangjp smart...@hotmail.com wrote: case fail CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2)) CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC; INSERT INTO TABLE students VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);
insert table error
case fail CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2)) CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC; INSERT INTO TABLE students VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);
Re: Bucket pruning
hi, thanks for the detailed response. i will experiment with your suggested orc bloom filter solution. it seems to me the obvious, most straight forward solution is to add support for hash partitioning. so i can do something like: create table T() partitioned by (x into num_partitions,..). upon insert hash(x) determines which partition to put the record in. upon select, the query processor can now hash on x and scan only that partition (this optimization will probably work only on = and other discrete filtering but thats true for partitioning in general). it seems all of this can be done early in the query plan phase and have no effect on underling infra. regards,cobby. On 12 במרץ 2015, at 23:05, Gopal Vijayaraghavan gop...@apache.org wrote: Hi, No and it¹s a shame because we¹re stuck on some compatibility details with this. The primary issue is the fact that the InputFormat is very generic and offers no way to communicate StorageDescriptor or bucketing. The split generation for something SequenceFileInputFormat lives inside MapReduce, where it has no idea about bucketing. So InputFormat.getSplits(conf) returns something relatively arbitrary, which contains a mixture of files when CombineInputFormat is turned on. I have implemented this twice so far for ORC (for custom Tez jobs, with huge wins) by using an MRv2 PathFilter over the regular OrcNewInputFormat implementation, by turning off combine input and using Tez grouping instead. But that has proved to be very fragile for a trunk feature, since with schema evolution of partitioned tables older partitions may be bucketed with a different count from a newer partition - so the StorageDescriptor for each partition has to be fetched across before we can generate a valid PathFilter. The SARGs are probably a better way to do this eventually as they can implement IN_BUCKET(1,2) to indicate 1 of 2 instead of the ³0_1² PathFilter which is fragile. Right now, the most fool-proof solution we¹ve hit upon was to apply the ORC bloom filter to the bucket columns, which is far safer as it does not care about the DDL - but does a membership check on the actual metadata prunes deeper at the stripe-level if it is sorted as well. That is somewhat neat since this doesn¹t need any new options for querying - it automatically(*) kicks in for your query pattern. Cheers, Gopal (*) - conditions apply - there¹s a threshold for file-size for these filters to be evaluated during planning (to prevent HS2 from burning CPU). From: Daniel Haviv daniel.ha...@veracity-group.com Reply-To: user@hive.apache.org user@hive.apache.org Date: Thursday, March 12, 2015 at 2:36 AM To: user@hive.apache.org user@hive.apache.org Subject: Bucket pruning Hi, We created a bucketed table and when we select in the following way: select * from testtble where bucket_col ='X'; We observe that there all of the table is being read and not just the specific bucket. Does Hive support such a feature ? Thanks, Daniel
Re: Hive on Spark
You need to copy the spark-assembly.jar to your hive/lib. Also, you can check hive.log to get more messages. On Fri, Mar 13, 2015 at 4:51 AM, Amith sha amithsh...@gmail.com wrote: Hi all, Recently i have configured Spark 1.2.0 and my environment is hadoop 2.6.0 hive 1.1.0 Here i have tried hive on Spark while executing insert into i am getting the following g error. Query ID = hadoop2_20150313162828_8764adad-a8e4-49da-9ef5-35e4ebd6bc63 Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapreduce.job.reduces=number Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask Have added the spark-assembly jar in hive lib And also in hive console using the command add jar followed by the steps set spark.home=/opt/spark-1.2.1/; add jar /opt/spark-1.2.1/assembly/target/scala-2.10/spark-assembly-1.2.1-hadoop2.4.0.jar; set hive.execution.engine=spark; set spark.master=spark://xxx:7077; set spark.eventLog.enabled=true; set spark.executor.memory=512m; set spark.serializer=org.apache.spark.serializer.KryoSerializer; Can anyone suggest Thanks Regards Amithsha
RE: Which SerDe for Custom Binary Data.
Hive as I use it is particularly useful for getting data out of relational tables and more importantly query that data using HiveQL (a variation of transact sql) . If your data is in binary format and assuming that you manage to store it in HDFS, how are you intending to access the data. At the consumer level what tools are you going to use? Do you a propriety tool with the correct drivers to access the data? HTH Mich Talebzadeh http://talebzadehmich.wordpress.com Publications due shortly: Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and Coherence Cache NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility. From: karthik maddala [mailto:karthikmaddal...@gmail.com] Sent: 13 March 2015 15:56 To: user@hive.apache.org Subject: Which SerDe for Custom Binary Data. I want to set up a DW based on Hive. However, my data does not come as handy csv files but as binary files in a proprietary format. The binary file consists of serialized data using C language. Could you please suggest which input format to be used and how to write a custom SerDe for the above mentioned data. Thanks, Karthik Maddala
Which SerDe for Custom Binary Data.
I want to set up a DW based on Hive. However, my data does not come as handy csv files but as binary files in a proprietary format. The binary file consists of serialized data using C language. Could you please suggest which input format to be used and how to write a custom SerDe for the above mentioned data. Thanks, Karthik Maddala
Fwd: Which SerDe for Custom Binary Data.
I want to set up a DW based on Hive. However, my data does not come as handy csv files but as binary files in a proprietary format. The binary file consists of serialized data using C language. Could you please suggest which input format to be used and how to write a custom SerDe for the above mentioned data. Thanks, Karthik Maddala
Re: Which SerDe for Custom Binary Data.
https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-HowtoWriteYourOwnSerDe Daniel On 13 במרץ 2015, at 17:56, karthik maddala karthikmaddal...@gmail.com wrote: I want to set up a DW based on Hive. However, my data does not come as handy csv files but as binary files in a proprietary format. The binary file consists of serialized data using C language. Could you please suggest which input format to be used and how to write a custom SerDe for the above mentioned data. Thanks, Karthik Maddala