store full line in block

2014-04-29 Thread Kishore kumar
Hi All,

My input data contains 45 columns, when I stored the data in hive table,
end of the line in a block storing part of the line, so when I query its
giving wrong result, please help me to store the full line in a block.



-- Thanks


*Kishore *


Re: store full line in block

2014-04-29 Thread Kishore kumar
some part of the line is storing in one block, and rest of the line is
storing in another block, when i query for column one hive is giving the
first column as the first column in another block(which begins from middle
of the line).


On Tue, Apr 29, 2014 at 2:05 PM, Kishore kumar kish...@techdigita.inwrote:


 Hi All,

 My input data contains 45 columns, when I stored the data in hive table,
 end of the line in a block storing part of the line, so when I query its
 giving wrong result, please help me to store the full line in a block.



 -- Thanks


 *Kishore *




--


Number of hive-server2 threads increments after jdbc connection

2014-04-29 Thread Dima Fadeyev

Hello everyone,

When I run a jdbc example from 
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBCClientSampleCode 
against my hive server, the number of hive-server2 threads increments. 
If I execute it long enough I either start seeing exceptions


Exception in thread main java.sql.SQLException: 
org.apache.thrift.TApplicationException: Internal error processing 
ExecuteStatement

at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:203)
at HiveJdbcClient.main(HiveJdbcClient.java:24)
Caused by: org.apache.thrift.TApplicationException: Internal error 
processing ExecuteStatement
at 
org.apache.thrift.TApplicationException.read(TApplicationException.java:108)

at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
at 
org.apache.hive.service.cli.thrift.TCLIService$Client.recv_ExecuteStatement(TCLIService.java:213)
at 
org.apache.hive.service.cli.thrift.TCLIService$Client.ExecuteStatement(TCLIService.java:200)

at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:197)
... 1 more

or I bump into a limit of connections with Zookeeper (each hive-server2 
thread maintains a connection with Zookeeper. I have 
hive.support.concurrency enabled).


In either case I can't connect to hive server after that.

I've tried this on hive 0.10 (cdh 4.4) and hive 0.12 (cdh 5.0 and hdp 
2.0.6) with same results.


Please, could anyone help me resolve this.
Thanks in advance.




OrcOutputFormat

2014-04-29 Thread Seema Datar
Hi,

I am trying to run an MR job to write files in ORC format.  I do not see any 
files created although the job runs successfully. If I change the output format 
from OrcOutputFormat to TextOutputFormat (and that being the only change), I 
see the output files getting created. I am using Hive-0.12.0. I tried upgrading 
to Hive 0.13.0 but with this version I get the following error -


2014-04-29 10:37:07,426 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error 
running child : java.lang.VerifyError: 
org/apache/hadoop/hive/ql/io/orc/OrcProto$RowIndex
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.init(WriterImpl.java:129)
at 
org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:369)
at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:104)
at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:91)
at 
org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.close(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:411)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:335)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1300)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)


How do you think can this issue be resolved?


Thanks,

Seema


Re: Number of hive-server2 threads increments after jdbc connection

2014-04-29 Thread Chinna Rao Lalam
Hi,

 In your code if more connections and statements are created?. If so,
closed those connections?
 After use close unused connections and statements.


Hope It Helps,
Chinna


On Tue, Apr 29, 2014 at 3:47 PM, Dima Fadeyev dfade...@pragsis.com wrote:

  Hello everyone,

 When I run a jdbc example from
 https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBCClientSampleCodeagainst
  my hive server, the number of hive-server2 threads increments. If I
 execute it long enough I either start seeing exceptions

 Exception in thread main java.sql.SQLException:
 org.apache.thrift.TApplicationException: Internal error processing
 ExecuteStatement
 at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:203)
 at HiveJdbcClient.main(HiveJdbcClient.java:24)
 Caused by: org.apache.thrift.TApplicationException: Internal error
 processing ExecuteStatement
 at
 org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
 at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
 at
 org.apache.hive.service.cli.thrift.TCLIService$Client.recv_ExecuteStatement(TCLIService.java:213)
 at
 org.apache.hive.service.cli.thrift.TCLIService$Client.ExecuteStatement(TCLIService.java:200)
 at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:197)
 ... 1 more

 or I bump into a limit of connections with Zookeeper (each hive-server2
 thread maintains a connection with Zookeeper. I have hive.support.concurrency
 enabled).

 In either case I can't connect to hive server after that.

 I've tried this on hive 0.10 (cdh 4.4) and hive 0.12 (cdh 5.0 and hdp
 2.0.6) with same results.

 Please, could anyone help me resolve this.
 Thanks in advance.





-- 
Hope It Helps,
Chinna


Re: Number of hive-server2 threads increments after jdbc connection

2014-04-29 Thread Dima Fadeyev

Hi, Chinna. Thanks for your reply.

Yes, modifying code solves the problem. This is what my code looks like 
(a piece of it):


Connection con = 
DriverManager.getConnection(jdbc:hive2://localhost:1/default, 
hive, hive);

Statement stmt = con.createStatement();
String tableName = testHiveDriverTable;
stmt.execute(drop table if exists  + tableName);
//stmt.close();

When I uncomment the last line, the number of hive-server2 threads 
doesn't keep incrementing to infinity. However I'm investigating the 
issue where the code is not really my code. Is there a way to correct 
this behavior from within hive-server2 without changing the client's code?


El 29/04/14 14:05, Chinna Rao Lalam escribió:

Hi,

 In your code if more connections and statements are created?. If so, 
closed those connections?

 After use close unused connections and statements.


Hope It Helps,
Chinna


On Tue, Apr 29, 2014 at 3:47 PM, Dima Fadeyev dfade...@pragsis.com 
mailto:dfade...@pragsis.com wrote:


Hello everyone,

When I run a jdbc example from

https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBCClientSampleCode
against my hive server, the number of hive-server2 threads
increments. If I execute it long enough I either start seeing
exceptions

Exception in thread main java.sql.SQLException:
org.apache.thrift.TApplicationException: Internal error processing
ExecuteStatement
at
org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:203)
at HiveJdbcClient.main(HiveJdbcClient.java:24)
Caused by: org.apache.thrift.TApplicationException: Internal error
processing ExecuteStatement
at
org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
at
org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
at

org.apache.hive.service.cli.thrift.TCLIService$Client.recv_ExecuteStatement(TCLIService.java:213)
at

org.apache.hive.service.cli.thrift.TCLIService$Client.ExecuteStatement(TCLIService.java:200)
at
org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:197)
... 1 more

or I bump into a limit of connections with Zookeeper (each
hive-server2 thread maintains a connection with Zookeeper. I have
hive.support.concurrency enabled).

In either case I can't connect to hive server after that.

I've tried this on hive 0.10 (cdh 4.4) and hive 0.12 (cdh 5.0 and
hdp 2.0.6) with same results.

Please, could anyone help me resolve this.
Thanks in advance.





--
Hope It Helps,
Chinna




Bug in Hive Partition windowing functions?

2014-04-29 Thread Keith
Hi, 

we have an issue with windowing function query never completed when
running against the large dataset  25,000 rows. That is the reducer
(only one) never exit and it appears stuck in an infinite loop. 
I looked at the Reducer counter and it never changes over the 6 hours when it 
gets stuck in a loop.

When the data set is small  25K rows, it runs fine.

Is there any work around this issue? We tested
against Hive 0.11/0.12/0.13 and the same result is the same.

create table window_function_fail
as
select a.*,
sum(case when bprice is not null then 1 else 0 end) over (partition by
date,name order by otime,bprice,aprice desc ROWS BETWEEN UNBOUNDED
PRECEDING AND CURRENT ROW) bidpid
from
large_table a;

create table large_table(
date   string, 
name string   ,
stime  string  ,  
bpricedecimal  ,  
apricedecimal   , 
otime double 
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' stored as textfile;

Thanks in advance.





Re: Problem adding jar using pyhs2

2014-04-29 Thread David Engel
Hi Brad,

Your test, after edting for local host/file names, etc. worked.  It
must be something else I'm doing wrong in my development stuff.  At
least I know it should work.  I'll figure it out eventually.  Thanks
again.

David

On Mon, Apr 28, 2014 at 10:22:57AM -0700, Brad Ruderman wrote:
 Hi David-
 Can you test the code? It is working for me. Make sure your jar is in HDFS
 and you are using the FQDN for referencing it.
 
 import pyhs2
 
 with pyhs2.connect(host='127.0.0.1',
port=1,
authMechanism=PLAIN,
user='root',
password='test',
database='default') as conn:
 with conn.cursor() as cur:
 cur.execute(ADD JAR hdfs://
 sandbox.hortonworks.com:8020/nexr-hive-udf-0.2-SNAPSHOT.jar)
  cur.execute(CREATE TEMPORARY FUNCTION substr AS
 'com.nexr.platform.hive.udf.UDFSubstrForOracle')
  #Execute query
 cur.execute(select substr(description,2,4) from sample_07)
 
 #Return column info from query
 print cur.getSchema()
 
 #Fetch table results
 for i in cur.fetch():
 print i
 
 Thanks,
 Brad
 
 
 On Mon, Apr 28, 2014 at 7:39 AM, David Engel da...@istwok.net wrote:
 
  Thanks for your response.
 
  We've essentially done your first suggestion in the past by copying or
  symlinking our jar into Hive's lib directory.  It works, but we'd like
  a better way for different users to to use different versions of our
  jar during development.  Perhaps that's not possible, though, without
  running completely differnt instances of Hive.
 
  I don't think your second suggestion will work.  The original problem
  is that when add jar file.jar is run through pyhs2, the fulle
  command gets passed to AddResourceProcessor.run(), yet
  AddResourceProcessor.run() is written such that it only expects jar
  file.jar to get passed to it.  That's how it appears to work when
  add jar file.jar is run from a stand-alone Hive CLI and from beeline.
 
  David
 
  On Sat, Apr 26, 2014 at 12:14:53AM -0700, Brad Ruderman wrote:
   An easy solution would be to add the jar to the classpath or auxlibs
   therefore every instance of hive already has the jar and you just need to
   create the temporary function.
  
   Else you can put the JAR in HDFS and reference the add jar using the hdfs
   scheme. Example:
  
   import pyhs2
  
   with pyhs2.connect(host='127.0.0.1',
  port=1,
  authMechanism=PLAIN,
  user='root',
  password='test',
  database='default') as conn:
   with conn.cursor() as cur:
   cur.execute(ADD JAR hdfs://
   sandbox.hortonworks.com:8020/nexr-hive-udf-0.2-SNAPSHOT.jar)
cur.execute(CREATE TEMPORARY FUNCTION substr AS
   'com.nexr.platform.hive.udf.UDFSubstrForOracle')
   #Execute query
   cur.execute(select substr(description,2,4) from sample_07)
  
   #Return column info from query
   print cur.getSchema()
  
   #Fetch table results
   for i in cur.fetch():
   print i
  
  
   On Fri, Apr 25, 2014 at 7:54 AM, David Engel da...@istwok.net wrote:
  
Hi,
   
I'm trying to convert some of our Hive queries to use the pyhs2 Python
package (https://github.com/BradRuderman/pyhs2).  Because we have our
own jar with some custom SerDes and UDFs, we need to use the add jar
/path/to/my.jar command to make them available to Hive.  This works
fine using the Hive CLI directly and also with the Beeline client.  It
doesn't work, however, with pyhs2.
   
I naively tracked the problem down to a bug in
AddResourceProcessor.run().  See HIVE-6971 in Jira.  My attempted fix
turned out to not be correct because it breaks the add command when
used from the CLI and Beeline.  It seems the add part of any add
file|jar|archive ... command needs to get stripped off somewhere
before it gets passed to AddResourceProcessor.run().  Unfortunately, I
can't find that location when the command is received from pyhs2.  Can
someone help?
   
David
--
David Engel
da...@istwok.net
   
 
  --
  David Engel
  da...@istwok.net
 

-- 
David Engel
da...@istwok.net


Re: OrcOutputFormat

2014-04-29 Thread Abhishek Girish
Hi,

AFAIK, you would need to use HCatalog APIs to read-from/write-to an
ORCFile. Please refer to
https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput

-Abhishek



On Tue, Apr 29, 2014 at 6:40 AM, Seema Datar sda...@yahoo-inc.com wrote:

  Hi,

  I am trying to run an MR job to write files in ORC format.  I do not see
 any files created although the job runs successfully. If I change the
 output format from OrcOutputFormat to TextOutputFormat (and that being the
 only change), I see the output files getting created. I am using
 Hive-0.12.0. I tried upgrading to Hive 0.13.0 but with this version I get
 the following error -

  2014-04-29 10:37:07,426 FATAL [main] org.apache.hadoop.mapred.YarnChild: 
 Error running child : java.lang.VerifyError: 
 org/apache/hadoop/hive/ql/io/orc/OrcProto$RowIndex
   at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.init(WriterImpl.java:129)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:369)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:104)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:91)
   at 
 org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.close(MapTask.java:784)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:411)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:335)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1300)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)

 How do you think can this issue be resolved?


 Thanks,

 Seema