Re: Which SerDe for Custom Binary Data.

2015-03-13 Thread karthik maddala
Currently we have data in NFS and we have proprietery tools to access the
data.
We are planning to move the data into HDFS and use HiveQL for accessing the
data and run batch jobs.
So looking out for the custom SerDe(assuming the existing SerDe's will not
be able to read the underlying data)  to read the data using hive.
On Fri, Mar 13, 2015 at 10:33 AM, Mich Talebzadeh m...@peridale.co.uk
wrote:

 Hive as I use it is particularly useful for getting data out of relational
 tables and more importantly query that data using HiveQL (a variation of
 transact sql)

 .



 If your data is in binary format and assuming that you manage to store it
 in HDFS, how are you intending to access the data. At the consumer level
 what tools are you going to use? Do you a propriety tool with the correct
 drivers to access the data?



 HTH



 Mich Talebzadeh



 http://talebzadehmich.wordpress.com



 *Publications due shortly:*

 *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
 Coherence Cache*



 NOTE: The information in this email is proprietary and confidential. This
 message is for the designated recipient only, if you are not the intended
 recipient, you should destroy it immediately. Any information in this
 message shall not be understood as given or endorsed by Peridale Ltd, its
 subsidiaries or their employees, unless expressly so stated. It is the
 responsibility of the recipient to ensure that this email is virus free,
 therefore neither Peridale Ltd, its subsidiaries nor their employees accept
 any responsibility.



 *From:* karthik maddala [mailto:karthikmaddal...@gmail.com]
 *Sent:* 13 March 2015 15:56
 *To:* user@hive.apache.org
 *Subject:* Which SerDe for Custom Binary Data.







 I want to set up a DW based on Hive. However, my data does not come as
 handy csv files but  as binary files in a proprietary format.



 The binary file  consists of  serialized data using C language.





 Could you please suggest which input format to be used and how to write a
 custom SerDe for the above mentioned data.





 Thanks,

 Karthik Maddala







Re: drop table command hang

2015-03-13 Thread Jeff Zhang
It doesn't matter whether I truncate the table, it always hangs there. Very
werid.



On Wed, Mar 11, 2015 at 3:06 PM, Mich Talebzadeh m...@peridale.co.uk
wrote:

 Have you truncated the table before dropping it? I



 Truncate table table_name

 Drop table rable_name



 Mich Talebzadeh



 http://talebzadehmich.wordpress.com



 *Publications due shortly:*

 *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
 Coherence Cache*



 NOTE: The information in this email is proprietary and confidential. This
 message is for the designated recipient only, if you are not the intended
 recipient, you should destroy it immediately. Any information in this
 message shall not be understood as given or endorsed by Peridale Ltd, its
 subsidiaries or their employees, unless expressly so stated. It is the
 responsibility of the recipient to ensure that this email is virus free,
 therefore neither Peridale Ltd, its subsidiaries nor their employees accept
 any responsibility.



 *From:* Jeff Zhang [mailto:zjf...@gmail.com]
 *Sent:* 11 March 2015 06:56
 *To:* user@hive.apache.org
 *Subject:* drop table command hang



 I invoke a drop table command and it hangs there. Here's the log. I am
 using mysql and I can invoke describe command and create table through
 mysql console, so I assume mysql works properly. Can anyone help this ?
 Thanks







 2015-03-11 14:48:09,441 INFO  [main]: ql.Driver
 (Driver.java:checkConcurrency(161)) - Concurrency mode is disabled, not
 creating a lock manager

 2015-03-11 14:48:09,441 INFO  [main]: log.PerfLogger
 (PerfLogger.java:PerfLogBegin(121)) - PERFLOG method=Driver.execute
 from=org.apache.hadoop.hive.ql.Driver

 2015-03-11 14:48:09,441 INFO  [main]: ql.Driver
 (Driver.java:execute(1321)) - Starting command: drop table
 student_bucketed_s1

 2015-03-11 14:48:09,441 INFO  [main]: log.PerfLogger
 (PerfLogger.java:PerfLogEnd(148)) - /PERFLOG method=TimeToSubmit
 start=1426056489421 end=1426056489441 duration=20
 from=org.apache.hadoop.hive.ql.Driver

 2015-03-11 14:48:09,442 INFO  [main]: log.PerfLogger
 (PerfLogger.java:PerfLogBegin(121)) - PERFLOG method=runTasks
 from=org.apache.hadoop.hive.ql.Driver

 2015-03-11 14:48:09,442 INFO  [main]: log.PerfLogger
 (PerfLogger.java:PerfLogBegin(121)) - PERFLOG method=task.DDL.Stage-0
 from=org.apache.hadoop.hive.ql.Driver

 2015-03-11 14:48:09,442 INFO  [main]: ql.Driver
 (Driver.java:launchTask(1640)) - Starting task [Stage-0:DDL] in serial mode

 2015-03-11 14:48:09,442 INFO  [main]: metastore.HiveMetaStore
 (HiveMetaStore.java:logInfo(743)) - 0: get_table : db=default
 tbl=student_bucketed_s1

 2015-03-11 14:48:09,443 INFO  [main]: HiveMetaStore.audit
 (HiveMetaStore.java:logAuditEvent(368)) - ugi=jzhang  ip=unknown-ip-addr
  cmd=get_table : db=default tbl=student_bucketed_s1

 2015-03-11 14:48:09,458 INFO  [main]: metastore.HiveMetaStore
 (HiveMetaStore.java:logInfo(743)) - 0: get_table : db=default
 tbl=student_bucketed_s1

 2015-03-11 14:48:09,458 INFO  [main]: HiveMetaStore.audit
 (HiveMetaStore.java:logAuditEvent(368)) - ugi=jzhang  ip=unknown-ip-addr
  cmd=get_table : db=default tbl=student_bucketed_s1

 2015-03-11 14:48:09,474 INFO  [main]: metastore.HiveMetaStore
 (HiveMetaStore.java:logInfo(743)) - 0: drop_table : db=default
 tbl=student_bucketed_s1

 2015-03-11 14:48:09,474 INFO  [main]: HiveMetaStore.audit
 (HiveMetaStore.java:logAuditEvent(368)) - ugi=jzhang  ip=unknown-ip-addr
  cmd=drop_table : db=default tbl=student_bucketed_s1






 --

 Best Regards

 Jeff Zhang




-- 
Best Regards

Jeff Zhang


Re: when start hive could not generate log file

2015-03-13 Thread zhangjp
I don't find any log.  I updated the log dir in hive-log4j.properties ,but it 
also didn't work.‍




-- Original --
From:  Jianfeng (Jeff) Zhang;jzh...@hortonworks.com;
Date:  Fri, Mar 13, 2015 08:50 AM
To:  user@hive.apache.orguser@hive.apache.org; 

Subject:  Re: when start hive could not generate log file



  
 
 By default, hive.log is located in /tmp/${user}/hive.log
 
 
 
 
  Best Regard,
 Jeff Zhang
 
 
 
 
 
 
   From: zhangjp smart...@hotmail.com
 Reply-To: user@hive.apache.org user@hive.apache.org
 Date: Wednesday, March 11, 2015 at 7:12 PM
 To: user@hive.apache.org user@hive.apache.org
 Subject: when start hive could not generate log file 
 
 
 
  when  i  run command   hive , the message as follows  [@xxx/]# hive
 log4j:WARN No appenders could be found for logger 
(org.apache.hadoop.hive.common.LogUtils).
 log4j:WARN Please initialize the log4j system properly.
 Logging initialized using configuration in 
file:/search/apache-hive-0.13.1-bin/conf/hive-log4j.properties?
 
 
 
 My hive-log4j.properties use the default template. but when I run find -name 
hive.log couldn't find any file.?

errors when select * from table limit 10 througt jdbc client

2015-03-13 Thread zhangjp
When i query with select col1 from table limit 10 it's ok ,but when i replace 
col1 to * it throws errors .

Caused by: org.apache.thrift.TApplicationException: Internal error processing 
FetchResults
at 
org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
at 
org.apache.hive.service.cli.thrift.TCLIService$Client.recv_FetchResults(TCLIService.java:501)
at 
org.apache.hive.service.cli.thrift.TCLIService$Client.FetchResults(TCLIService.java:488)
at 
org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:329)
... 74 more‍

Applying UDFs on load.

2015-03-13 Thread karthik ramachandran
Hi all,

I have been loading my data into Hive as Strings, and then using the SELECT
INTO statement to apply UDFs to transform my data.

I was just wondering if there is a better way to do this - perhaps a way to
apply UDFs on a LOAD DATA call? Or something involving the new temporary
tables feature?

Thanks!


Re: insert table error

2015-03-13 Thread Daniel Haviv
What is the error you get?

Daniel

 On 13 במרץ 2015, at 13:13, zhangjp smart...@hotmail.com wrote:
 
 case fail 
 CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2))
   CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC;
 INSERT INTO TABLE students
   VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);


Hive on Spark

2015-03-13 Thread Amith sha
Hi all,


Recently i have configured Spark 1.2.0 and my environment is hadoop
2.6.0 hive 1.1.0 Here i have tried hive on Spark while executing
insert into i am getting the following g error.

Query ID = hadoop2_20150313162828_8764adad-a8e4-49da-9ef5-35e4ebd6bc63
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=number
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=number
In order to set a constant number of reducers:
  set mapreduce.job.reduces=number
Failed to execute spark task, with exception
'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create
spark client.)'
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.spark.SparkTask



Have added the spark-assembly jar in hive lib
And also in hive console using the command add jar followed by the  steps

set spark.home=/opt/spark-1.2.1/;


add jar 
/opt/spark-1.2.1/assembly/target/scala-2.10/spark-assembly-1.2.1-hadoop2.4.0.jar;



set hive.execution.engine=spark;


set spark.master=spark://xxx:7077;


set spark.eventLog.enabled=true;


set spark.executor.memory=512m;


set spark.serializer=org.apache.spark.serializer.KryoSerializer;

Can anyone suggest



Thanks  Regards
Amithsha


Re: insert table error

2015-03-13 Thread Raunak Jhawar
What version of Hive are you using. INSERT INTO ... VALUES is supported
only from Hive 0.14 onwards.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingvaluesintotablesfromSQL


--
Thanks,
Raunak Jhawar
m: 09820890034






On Fri, Mar 13, 2015 at 4:45 PM, Daniel Haviv 
daniel.ha...@veracity-group.com wrote:

 What is the error you get?

 Daniel

 On 13 במרץ 2015, at 13:13, zhangjp smart...@hotmail.com wrote:

 case fail
 CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2))
   CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC;
 INSERT INTO TABLE students
   VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);‍




insert table error

2015-03-13 Thread zhangjp
case fail 
CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2))
  CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC;
INSERT INTO TABLE students
  VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);‍

Re: Bucket pruning

2015-03-13 Thread cobby
hi, thanks for the detailed response.
i will experiment with your suggested orc bloom filter solution.

it seems to me the obvious, most straight forward solution is to add support 
for hash partitioning. so i can do something like:

create table T()
partitioned by (x into num_partitions,..).

upon insert hash(x) determines which partition to put the record in. upon 
select, the query processor can now hash on x and scan only that partition 
(this optimization will probably work only on = and other discrete filtering 
but thats true for partitioning in general).
it seems all of this can be done early in the query plan phase and have no 
effect on underling infra.

regards,cobby.



 On 12 במרץ 2015, at 23:05, Gopal Vijayaraghavan gop...@apache.org wrote:
 
 Hi,
 
 No and it¹s a shame because we¹re stuck on some compatibility details with
 this.
 
 The primary issue is the fact that the InputFormat is very generic and
 offers no way to communicate StorageDescriptor or bucketing.
 
 The split generation for something SequenceFileInputFormat lives inside
 MapReduce, where it has no idea about bucketing.
 
 So InputFormat.getSplits(conf) returns something relatively arbitrary,
 which contains a mixture of files when CombineInputFormat is turned on.
 
 I have implemented this twice so far for ORC (for custom Tez jobs, with
 huge wins) by using an MRv2 PathFilter over the regular OrcNewInputFormat
 implementation, by turning off combine input and using Tez grouping
 instead.
 
 But that has proved to be very fragile for a trunk feature, since with
 schema evolution of partitioned tables older partitions may be bucketed
 with a different count from a newer partition - so the StorageDescriptor
 for each partition has to be fetched across before we can generate a valid
 PathFilter.
 
 The SARGs are probably a better way to do this eventually as they can
 implement IN_BUCKET(1,2) to indicate 1 of 2 instead of the ³0_1²
 PathFilter which is fragile.
 
 
 Right now, the most fool-proof solution we¹ve hit upon was to apply the
 ORC bloom filter to the bucket columns, which is far safer as it does not
 care about the DDL - but does a membership check on the actual metadata 
 prunes deeper at the stripe-level if it is sorted as well.
 
 That is somewhat neat since this doesn¹t need any new options for querying
 - it automatically(*) kicks in for your query pattern.
 
 Cheers,
 Gopal
 (*) - conditions apply - there¹s a threshold for file-size for these
 filters to be evaluated during planning (to prevent HS2 from burning CPU).
 
 
 From:  Daniel Haviv daniel.ha...@veracity-group.com
 Reply-To:  user@hive.apache.org user@hive.apache.org
 Date:  Thursday, March 12, 2015 at 2:36 AM
 To:  user@hive.apache.org user@hive.apache.org
 Subject:  Bucket pruning
 
 
 Hi,
 We created a bucketed table and when we select in the following way:
 select * 
 from testtble
 where bucket_col ='X';
 
 We observe that there all of the table is being read and not just the
 specific bucket.
 
 Does Hive support such a feature ?
 
 
 Thanks,
 Daniel
 
 


Re: Hive on Spark

2015-03-13 Thread Xuefu Zhang
You need to copy the spark-assembly.jar to your hive/lib.

Also, you can check hive.log to get more messages.

On Fri, Mar 13, 2015 at 4:51 AM, Amith sha amithsh...@gmail.com wrote:

 Hi all,


 Recently i have configured Spark 1.2.0 and my environment is hadoop
 2.6.0 hive 1.1.0 Here i have tried hive on Spark while executing
 insert into i am getting the following g error.

 Query ID = hadoop2_20150313162828_8764adad-a8e4-49da-9ef5-35e4ebd6bc63
 Total jobs = 1
 Launching Job 1 out of 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapreduce.job.reduces=number
 Failed to execute spark task, with exception
 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create
 spark client.)'
 FAILED: Execution Error, return code 1 from
 org.apache.hadoop.hive.ql.exec.spark.SparkTask



 Have added the spark-assembly jar in hive lib
 And also in hive console using the command add jar followed by the  steps

 set spark.home=/opt/spark-1.2.1/;


 add jar
 /opt/spark-1.2.1/assembly/target/scala-2.10/spark-assembly-1.2.1-hadoop2.4.0.jar;



 set hive.execution.engine=spark;


 set spark.master=spark://xxx:7077;


 set spark.eventLog.enabled=true;


 set spark.executor.memory=512m;


 set spark.serializer=org.apache.spark.serializer.KryoSerializer;

 Can anyone suggest



 Thanks  Regards
 Amithsha



RE: Which SerDe for Custom Binary Data.

2015-03-13 Thread Mich Talebzadeh
Hive as I use it is particularly useful for getting data out of relational 
tables and more importantly query that data using HiveQL (a variation of 
transact sql)

.

 

If your data is in binary format and assuming that you manage to store it in 
HDFS, how are you intending to access the data. At the consumer level what 
tools are you going to use? Do you a propriety tool with the correct drivers to 
access the data?

 

HTH

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and 
Coherence Cache

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees, unless expressly so stated. It is the responsibility of the 
recipient to ensure that this email is virus free, therefore neither Peridale 
Ltd, its subsidiaries nor their employees accept any responsibility.

 

From: karthik maddala [mailto:karthikmaddal...@gmail.com] 
Sent: 13 March 2015 15:56
To: user@hive.apache.org
Subject: Which SerDe for Custom Binary Data.

 

 

 

I want to set up a DW based on Hive. However, my data does not come as handy 
csv files but  as binary files in a proprietary format.

 

The binary file  consists of  serialized data using C language.

 

 

Could you please suggest which input format to be used and how to write a 
custom SerDe for the above mentioned data.

 

 

Thanks,

Karthik Maddala

 

 



Which SerDe for Custom Binary Data.

2015-03-13 Thread karthik maddala
I want to set up a DW based on Hive. However, my data does not come as
handy csv files but  as binary files in a proprietary format.

The binary file  consists of  serialized data using C language.


Could you please suggest which input format to be used and how to write a
custom SerDe for the above mentioned data.


Thanks,
Karthik Maddala


Fwd: Which SerDe for Custom Binary Data.

2015-03-13 Thread karthik maddala
I want to set up a DW based on Hive. However, my data does not come as
handy csv files but  as binary files in a proprietary format.

The binary file  consists of  serialized data using C language.


Could you please suggest which input format to be used and how to write a
custom SerDe for the above mentioned data.


Thanks,
Karthik Maddala


Re: Which SerDe for Custom Binary Data.

2015-03-13 Thread Daniel Haviv
https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-HowtoWriteYourOwnSerDe


Daniel

 On 13 במרץ 2015, at 17:56, karthik maddala karthikmaddal...@gmail.com wrote:
 
  
  
 I want to set up a DW based on Hive. However, my data does not come as handy 
 csv files but  as binary files in a proprietary format.
  
 The binary file  consists of  serialized data using C language.
  
  
 Could you please suggest which input format to be used and how to write a 
 custom SerDe for the above mentioned data.
  
  
 Thanks,
 Karthik Maddala