Re: ORA-8177 with Hive transactions

2015-09-24 Thread Steve Howard
All,

We continue to struggle with this.  We *never* get  the lock, and found one
issue in which the retry logic gets in an infinite loop.  We submitted a
JIRA for that (https://issues.apache.org/jira/browse/HIVE-11934), and
patched our version (HDP 2.3, Hive 1.2.1) with a fix in which the
deadlockCount variable is no longer managed in the lock() method.  That
works, but we still couldn't get the lock, and the exception was thrown
after ten retries.  At least we knew it was broken earlier ;)

We have made the changes to the HIVE transaction tables to enable
ROWDEPENDENCIES, but are still plagued with serialization errors that are
never resolved.

We have only a single writer, as the Hive database environment is used as a
target for an existing EDW dataset.  The job loader is the only one making
changes.  However, it is for analytics, so we have a lot of readers.

We have considered changing the TRANSACTION_SERIALIZABLE for the dbConn()
method call in the TxnHandler class to READ_COMMITTED, as Oracle provides
consistent reads.  Of course, the serialization exception is thrown when
one thread (a read or a writer I guess) attempts to lock a hive table (or
in our case, several hundred daily hive table partitions) attempts to
update the row, and another thread has changed and committed it in the
meantime.

Unless I missing it, this will always be an issue since we have readers and
writers and each appears to take a lock.

If we know we will have a single writer, the largest risk is that the
reader thinks the data hasn't changed, when it has.  For our needs, that
isn't a huge issue.

Are we missing something?  Any ideas?

Thanks,

Steve

On Fri, Sep 18, 2015 at 3:39 PM, Steve Howard 
wrote:

> I think ROWDEPENDENCIES on an Oracle table also covers this issue, so I
> don't think a separate JIRA is needed for the INITRANS change.
>
> On Fri, Sep 18, 2015 at 2:51 PM, Sergey Shelukhin 
> wrote:
>
>> There’s HIVE-11831 <https://issues.apache.org/jira/browse/HIVE-11831>
>>  and https://issues.apache.org/jira/browse/HIVE-11833 that try to
>> address this.
>> We can do a patch similar to the first one; can you file a JIRA?
>>
>> From: Steve Howard 
>> Reply-To: "user@hive.apache.org" 
>> Date: Friday, September 18, 2015 at 10:54
>> To: "user@hive.apache.org" 
>> Subject: ORA-8177 with Hive transactions
>>
>> While troubleshooting an issue with transactions shortly after enabling
>> them, I noticed the following in an Oracle trace, which is our metastore
>> for hive...
>>
>> ORA-8177: can't serialize access for this transaction
>>
>> These were thrown on "insert into HIVE_LOCKS..."
>>
>> Traditionally in Oracle, if an application actually needs serializable
>> transactions, the fix is to to set initrans and maxtrans to the number of
>> concurrent writers.  When I ran what is below on a table similar to
>> HIVE_LOCKS, this exception was thrown everywhere.  The fix is to recreate
>> the table with higher values for initrans (only 1 is the default for
>> initrans, and 255 is the default for maxtrans).  When I did this and re-ran
>> what is below, the exceptions were no longer thrown.
>>
>> Does anyone have any feedback on this performance hint?  The exceptions
>> in hive are thrown from the checkRetryable method in the TxnHandler class,
>> but I couldn't find what class.method throws them.  Perhaps the exceptions
>> are not impactful, but given the fact the method expects them as it checks
>> for the string in the exception message, I thought I would ask for feedback
>> before we recreate the HIVE_LOCKS table with a higher value for INITRANS.
>>
>> import java.sql.*;public class testLock implements Runnable {
>>   public static void main (String[] args) throws Exception {
>> Class.forName("oracle.jdbc.driver.OracleDriver");
>> for (int i = 1; i <= 100; i++) {
>>   testLock tl = new testLock();
>> }
>>   }
>>
>>   public testLock() {
>> Thread t = new Thread(this);
>> t.start();
>>   }
>>
>>   public void run() {
>> try {
>>   Connection conn = 
>> DriverManager.getConnection("jdbc:oracle:thin:username/pwd@dbhost:1521/dbservice");
>>   conn.createStatement().execute("alter session set isolation_level = 
>> serializable");
>>   PreparedStatement pst = conn.prepareStatement("update test set a = ?");
>>   for (int j = 1; j <= 1; j++) {
>> pst.setInt(1,j);
>> pst.execute();
>> conn.commit();
>> System.out.println("worked");
>>   }
>> }
>> catch (Exception e) {
>>   System.out.println(e.getMessage());
>> }
>>   }}
>>
>>
>


Re: Transaction deadlocks

2015-09-23 Thread Steve Howard
Done, HIVE-11934

Sent from my iPad

> On Sep 23, 2015, at 12:03 PM, Eugene Koifman  wrote:
> 
> could you file a Jira for this please?  
> 
> From: Eugene Koifman 
> Reply-To: "user@hive.apache.org" 
> Date: Wednesday, September 23, 2015 at 8:15 AM
> To: "user@hive.apache.org" 
> Subject: Re: Transaction deadlocks
> 
> that’s indeed a problem.
> 
> From: Steve Howard 
> Reply-To: "user@hive.apache.org" 
> Date: Tuesday, September 22, 2015 at 6:48 PM
> To: "user@hive.apache.org" 
> Subject: Re: Transaction deadlocks
> 
> Am I missing it, or do you reset the deadlock count to 0 every time the lock 
> method is taken, so the ten count is never reached in checkRetryable()?
> 
>   public LockResponse lock(LockRequest rqst)
> throws NoSuchTxnException, TxnAbortedException, MetaException
>   {
> >>>this.deadlockCnt = 0; <<<
> try
> {
>   Connection dbConn = null;
>   try
>   {
> dbConn = getDbConn(8);
> return lock(dbConn, rqst, true);
>   }
>   catch (SQLException e)
>   {
> LOG.debug("Going to rollback");
> rollbackDBConn(dbConn);
> checkRetryable(dbConn, e, "lock(" + rqst + ")");
> throw new MetaException("Unable to update transaction database " + 
> StringUtils.stringifyException(e));
>   }
>   finally
>   {
> closeDbConn(dbConn);
>   }
>   return lock(rqst);
> }
> catch (RetryException e) {}
>   }
> 
> 
>> On Tue, Sep 22, 2015 at 9:24 PM, Steve Howard  wrote:
>> Hi Eugene,
>> 
>> This is HDP 2.3, 1.2.1.2.3.0.0-2557.
>> 
>> I see the deadlock count is 10, and we haven't overriden 
>> ConfVars.HMSHANDLERATTEMPTS, which also defaults to 10.  I am at a loss why 
>> the call stack would be that large (I think it is 1,000), as that indicates 
>> to me it keeps failing to get the lock and keeps calling itself.
>> 
>>   protected void checkRetryable(Connection conn, SQLException e, String 
>> caller)
>> throws TxnHandler.RetryException, MetaException
>>   {
>> if ((this.dbProduct == null) && (conn != null)) {
>>   determineDatabaseProduct(conn);
>> }
>> if (((e instanceof SQLTransactionRollbackException)) || 
>> (((this.dbProduct != DatabaseProduct.MYSQL) && (this.dbProduct != 
>> DatabaseProduct.POSTGRES) && (this.dbProduct != DatabaseProduct.SQLSERVER)) 
>> || ((e.getSQLState().equals("40001")) || ((this.dbProduct == 
>> DatabaseProduct.POSTGRES) && (e.getSQLState().equals("40P01"))) || 
>> ((this.dbProduct == DatabaseProduct.ORACLE) && 
>> ((e.getMessage().contains("deadlock detected")) || 
>> (e.getMessage().contains("can't serialize access for this transaction")))
>> {
>>   if (this.deadlockCnt++ < 10)
>>   {
>> long waitInterval = this.deadlockRetryInterval * this.deadlockCnt;
>> LOG.warn("Deadlock detected in " + caller + ". Will wait " + 
>> waitInterval + "ms try again up to " + (10 - this.deadlockCnt + 1) + " 
>> times.");
>> try
>> {
>>   Thread.sleep(waitInterval);
>> }
>> catch (InterruptedException ie) {}
>> throw new RetryException();
>>   }
>>   LOG.error("Too many repeated deadlocks in " + caller + ", giving up.");
>>   this.deadlockCnt = 0;
>> }
>> else if (isRetryable(e))
>> {
>>   if (this.retryNum++ < this.retryLimit)
>>   {
>> LOG.warn("Retryable error detected in " + caller + ".  Will wait " + 
>> this.retryInterval + "ms and retry up to " + (this.retryLimit - 
>> this.retryNum + 1) + " times.  Error: " + getMessage(e));
>> try
>> {
>>   Thread.sleep(this.retryInterval);
>> }
>> catch (InterruptedException ex) {}
>> throw new RetryException();
>>   }
>>   LOG.error("Fatal error. Retry limit (" + this.retryLimit + ") reached. 
>> Last error: " + getMessage(e));
>>   this.retryNum = 0;
>> }
>> else
>> {
>>   this.deadlockCnt = 0;
>>   this.retryNum = 0;
>> }
>>   }
>> 
>> Thanks,
>> 
>> Steve
>> 
>>> On Tue, Sep 22, 2015 at 6:22 PM, Eugene Koifman  
>>>

Re: Transaction deadlocks

2015-09-22 Thread Steve Howard
Am I missing it, or do you reset the deadlock count to 0 every time the
lock method is taken, so the ten count is never reached in checkRetryable()?

  public LockResponse lock(LockRequest rqst)
throws NoSuchTxnException, TxnAbortedException, MetaException
  {
>>>this.deadlockCnt = 0; <<<
try
{
  Connection dbConn = null;
  try
  {
dbConn = getDbConn(8);
return lock(dbConn, rqst, true);
  }
  catch (SQLException e)
  {
LOG.debug("Going to rollback");
rollbackDBConn(dbConn);
checkRetryable(dbConn, e, "lock(" + rqst + ")");
throw new MetaException("Unable to update transaction database " +
StringUtils.stringifyException(e));
  }
  finally
  {
closeDbConn(dbConn);
  }
  return lock(rqst);
}
catch (RetryException e) {}
  }


On Tue, Sep 22, 2015 at 9:24 PM, Steve Howard 
wrote:

> Hi Eugene,
>
> This is HDP 2.3, 1.2.1.2.3.0.0-2557.
>
> I see the deadlock count is 10, and we haven't overriden
> ConfVars.HMSHANDLERATTEMPTS, which also defaults to 10.  I am at a loss why
> the call stack would be that large (I think it is 1,000), as that indicates
> to me it keeps failing to get the lock and keeps calling itself.
>
>   protected void checkRetryable(Connection conn, SQLException e, String
> caller)
> throws TxnHandler.RetryException, MetaException
>   {
> if ((this.dbProduct == null) && (conn != null)) {
>   determineDatabaseProduct(conn);
> }
> if (((e instanceof SQLTransactionRollbackException)) ||
> (((this.dbProduct != DatabaseProduct.MYSQL) && (this.dbProduct !=
> DatabaseProduct.POSTGRES) && (this.dbProduct != DatabaseProduct.SQLSERVER))
> || ((e.getSQLState().equals("40001")) || ((this.dbProduct ==
> DatabaseProduct.POSTGRES) && (e.getSQLState().equals("40P01"))) ||
> ((this.dbProduct == DatabaseProduct.ORACLE) &&
> ((e.getMessage().contains("deadlock detected")) ||
> (e.getMessage().contains("can't serialize access for this
> transaction")))
> {
>   if (this.deadlockCnt++ < 10)
>   {
> long waitInterval = this.deadlockRetryInterval * this.deadlockCnt;
> LOG.warn("Deadlock detected in " + caller + ". Will wait " +
> waitInterval + "ms try again up to " + (10 - this.deadlockCnt + 1) + "
> times.");
> try
> {
>   Thread.sleep(waitInterval);
> }
> catch (InterruptedException ie) {}
> throw new RetryException();
>   }
>   LOG.error("Too many repeated deadlocks in " + caller + ", giving
> up.");
>   this.deadlockCnt = 0;
> }
> else if (isRetryable(e))
> {
>   if (this.retryNum++ < this.retryLimit)
>   {
> LOG.warn("Retryable error detected in " + caller + ".  Will wait "
> + this.retryInterval + "ms and retry up to " + (this.retryLimit -
> this.retryNum + 1) + " times.  Error: " + getMessage(e));
> try
> {
>   Thread.sleep(this.retryInterval);
> }
> catch (InterruptedException ex) {}
> throw new RetryException();
>   }
>   LOG.error("Fatal error. Retry limit (" + this.retryLimit + ")
> reached. Last error: " + getMessage(e));
>   this.retryNum = 0;
> }
> else
> {
>   this.deadlockCnt = 0;
>   this.retryNum = 0;
> }
>   }
>
> Thanks,
>
> Steve
>
> On Tue, Sep 22, 2015 at 6:22 PM, Eugene Koifman 
> wrote:
>
>> What version of Hive are you running?  In the current codebase at least,
>> TxnHandler.checkRetryable() keeps track of how many retries it allowed – it
>> will not retry forever.
>>
>>
>> From: Steve Howard 
>> Reply-To: "user@hive.apache.org" 
>> Date: Tuesday, September 22, 2015 at 1:32 PM
>> To: "user@hive.apache.org" 
>> Subject: Transaction deadlocks
>>
>>
>>
>> Thread A…
>>
>>
>>
>>
>>
>> "HiveServer2-Background-Pool: Thread-35" #35 prio=5 os_prio=0
>> tid=0x7fd150e4 nid=0x2c97 runnable [0x7fd146e0a000]
>>
>>java.lang.Thread.State: RUNNABLE
>>
>> at java.net.SocketInputStream.socketRead0(Native Method)
>>
>> at
>> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>>
>> at java.net.SocketInputStream.read(SocketInputStream.java:170)
>>
>> at java.net.SocketInputStream.read(So

Re: Transaction deadlocks

2015-09-22 Thread Steve Howard
Hi Eugene,

This is HDP 2.3, 1.2.1.2.3.0.0-2557.

I see the deadlock count is 10, and we haven't overriden
ConfVars.HMSHANDLERATTEMPTS, which also defaults to 10.  I am at a loss why
the call stack would be that large (I think it is 1,000), as that indicates
to me it keeps failing to get the lock and keeps calling itself.

  protected void checkRetryable(Connection conn, SQLException e, String
caller)
throws TxnHandler.RetryException, MetaException
  {
if ((this.dbProduct == null) && (conn != null)) {
  determineDatabaseProduct(conn);
}
if (((e instanceof SQLTransactionRollbackException)) ||
(((this.dbProduct != DatabaseProduct.MYSQL) && (this.dbProduct !=
DatabaseProduct.POSTGRES) && (this.dbProduct != DatabaseProduct.SQLSERVER))
|| ((e.getSQLState().equals("40001")) || ((this.dbProduct ==
DatabaseProduct.POSTGRES) && (e.getSQLState().equals("40P01"))) ||
((this.dbProduct == DatabaseProduct.ORACLE) &&
((e.getMessage().contains("deadlock detected")) ||
(e.getMessage().contains("can't serialize access for this
transaction")))
{
  if (this.deadlockCnt++ < 10)
  {
long waitInterval = this.deadlockRetryInterval * this.deadlockCnt;
LOG.warn("Deadlock detected in " + caller + ". Will wait " +
waitInterval + "ms try again up to " + (10 - this.deadlockCnt + 1) + "
times.");
try
{
  Thread.sleep(waitInterval);
}
catch (InterruptedException ie) {}
throw new RetryException();
  }
  LOG.error("Too many repeated deadlocks in " + caller + ", giving
up.");
  this.deadlockCnt = 0;
}
else if (isRetryable(e))
{
  if (this.retryNum++ < this.retryLimit)
  {
LOG.warn("Retryable error detected in " + caller + ".  Will wait "
+ this.retryInterval + "ms and retry up to " + (this.retryLimit -
this.retryNum + 1) + " times.  Error: " + getMessage(e));
try
{
  Thread.sleep(this.retryInterval);
}
catch (InterruptedException ex) {}
throw new RetryException();
  }
  LOG.error("Fatal error. Retry limit (" + this.retryLimit + ")
reached. Last error: " + getMessage(e));
  this.retryNum = 0;
}
else
{
  this.deadlockCnt = 0;
  this.retryNum = 0;
}
  }

Thanks,

Steve

On Tue, Sep 22, 2015 at 6:22 PM, Eugene Koifman 
wrote:

> What version of Hive are you running?  In the current codebase at least,
> TxnHandler.checkRetryable() keeps track of how many retries it allowed – it
> will not retry forever.
>
>
> From: Steve Howard 
> Reply-To: "user@hive.apache.org" 
> Date: Tuesday, September 22, 2015 at 1:32 PM
> To: "user@hive.apache.org" 
> Subject: Transaction deadlocks
>
>
>
> Thread A…
>
>
>
>
>
> "HiveServer2-Background-Pool: Thread-35" #35 prio=5 os_prio=0
> tid=0x7fd150e4 nid=0x2c97 runnable [0x7fd146e0a000]
>
>java.lang.Thread.State: RUNNABLE
>
> at java.net.SocketInputStream.socketRead0(Native Method)
>
> at
> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>
> at java.net.SocketInputStream.read(SocketInputStream.java:170)
>
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
>
> at oracle.net.ns.Packet.receive(Packet.java:300)
>
> at oracle.net.ns.DataPacket.receive(DataPacket.java:106)
>
> at
> oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:315)
>
> at oracle.net.ns.NetInputStream.read(NetInputStream.java:260)
>
> at oracle.net.ns.NetInputStream.read(NetInputStream.java:185)
>
> at oracle.net.ns.NetInputStream.read(NetInputStream.java:102)
>
> at
> oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:124)
>
> at
> oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:80)
>
> at
> oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1137)
>
> at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:290)
>
> at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
>
> at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)
>
> at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:193)
>
> at
> oracle.jdbc.driver.T4CStatement.executeForRows(T4CStatement.java:1033)
>
> at
> oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1329)
>
> at
> oracle.jdbc.driver.OracleStatement.executeUpdateInternal(OracleStatement.java:1838

Re: mapjoin with left join

2015-09-22 Thread Steve Howard
Hi Gopal/All,

Yep, I absolutely understand the limitation of what we are trying to do.
We will try the settings you suggested.

Thanks,

Steve

On Tue, Sep 22, 2015 at 1:44 PM, Gopal Vijayaraghavan 
wrote:

>
> > select small.* from small s left join large l on s.id  =
> >l.id  where l.id  is null;
> ...
> > We simply want to load the 81K rows in to RAM, then for each row in
> >large, check the small hash table and if it the row in small is not in
> >large, then add it to large.
>
> That seems like a fair description of the problem with the approach, but
> it is fairly hard to do left outers in a distributed fashion with
> correctness.
>
> Imagine I load partition #1 from the big table + the small table into a
> task - how would I know partiton #2 doesn't have a key from the small
> hashtable. So the system cannot output any rows in the naive version of
> this problem until it compares the small table to all parts of the big
> table.
>
> That I think was Sergey's point & I don't know of an easy way around that
> yet.
>
> > The business case is loading only new rows into a large fact table.  The
> >new rows are the ones that are small in number.
>
> That however is much easier - because that's better written as.
>
> insert into large select * from small where id NOT IN (select id from
> large);
>
> FYI, we want people to stop writing these sort of queries once the MERGE
> syntax lands in HIVE - https://issues.apache.org/jira/browse/HIVE-10924
>
>
> The trick is to rewrite the NOT IN condition as two set operations - inner
> join + remainder left outer.
>
> Find all ids which already exist in bigtable with an inner map-join.
> Remove all those ids from the small table & insert the remainder.
>
> explain rewrite select remainder.* from small remainder where id not in
> (select id from small, large where small.id = large.id);
>
> That query, you will find will run much faster than the query you're
> currently using.
>
> If you can, please send the "explain rewrite" & I can probably fine-tune
> this approach further.
>
> If you're using Tez, you might want to try out the custom vertex managers
> as well
>
> set hive.vectorized.execution.mapjoin.minmax.enabled=true;
>
> set hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled=true;
> set hive.optimize.dynamic.partition.hashjoin=true;
>
> Those params should allow you to scale up a map-join to ~100x the
> available RAM (aimed at LLAP memory utilization).
>
> Cheers,
> Gopal
>
>
>


Transaction deadlocks

2015-09-22 Thread Steve Howard
Thread A…





"HiveServer2-Background-Pool: Thread-35" #35 prio=5 os_prio=0
tid=0x7fd150e4 nid=0x2c97 runnable [0x7fd146e0a000]

   java.lang.Thread.State: RUNNABLE

at java.net.SocketInputStream.socketRead0(Native Method)

at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)

at java.net.SocketInputStream.read(SocketInputStream.java:170)

at java.net.SocketInputStream.read(SocketInputStream.java:141)

at oracle.net.ns.Packet.receive(Packet.java:300)

at oracle.net.ns.DataPacket.receive(DataPacket.java:106)

at
oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:315)

at oracle.net.ns.NetInputStream.read(NetInputStream.java:260)

at oracle.net.ns.NetInputStream.read(NetInputStream.java:185)

at oracle.net.ns.NetInputStream.read(NetInputStream.java:102)

at
oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:124)

at
oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:80)

at
oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1137)

at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:290)

at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)

at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)

at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:193)

at
oracle.jdbc.driver.T4CStatement.executeForRows(T4CStatement.java:1033)

at
oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1329)

at
oracle.jdbc.driver.OracleStatement.executeUpdateInternal(OracleStatement.java:1838)

at
oracle.jdbc.driver.OracleStatement.executeUpdate(OracleStatement.java:1803)

- locked <0xc09fcda0> (a oracle.jdbc.driver.T4CConnection)

at
oracle.jdbc.driver.OracleStatementWrapper.executeUpdate(OracleStatementWrapper.java:294)

at
org.apache.commons.dbcp.DelegatingStatement.executeUpdate(DelegatingStatement.java:228)

at
org.apache.commons.dbcp.DelegatingStatement.executeUpdate(DelegatingStatement.java:228)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:1432)

- locked <0xc09fcc28> (a java.lang.Object)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:422)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)





…blocks Thread B, resulting in a deadlock message being written to the logs
over and over again…









"HiveServer2-Background-Pool: Thread-51" #51 prio=5 os_prio=0
tid=0x0279f000 nid=0x7227 waiting for monitor entry
[0x7fd146a47000]

   java.lang.Thread.State: BLOCKED (on object monitor)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:1361)

- waiting to lock <0xc09fcc28> (a java.lang.Object)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:422)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)





…but my guess is the bigger issue is that eventually the stack will be
exhausted, as the method below recursively calls itself (notice the stack
above)…





  public LockResponse lock(LockRequest rqst)

throws NoSuchTxnException, TxnAbortedException, MetaException

  {

this.deadlockCnt = 0;

try

{

  Connection dbConn = null;

  try

  {

dbConn = getDbConn(8);

return lock(dbConn, rqst, true);

  }

  catch (SQLException e)

  {

LOG.debug("Going to rollback");

rollbackDBConn(dbConn);

checkRetryable(dbConn, e, "lock(" + rqst + ")");

throw 

Re: ORA-8177 with Hive transactions

2015-09-18 Thread Steve Howard
I think ROWDEPENDENCIES on an Oracle table also covers this issue, so I
don't think a separate JIRA is needed for the INITRANS change.

On Fri, Sep 18, 2015 at 2:51 PM, Sergey Shelukhin 
wrote:

> There’s HIVE-11831 <https://issues.apache.org/jira/browse/HIVE-11831> and
> https://issues.apache.org/jira/browse/HIVE-11833 that try to address this.
> We can do a patch similar to the first one; can you file a JIRA?
>
> From: Steve Howard 
> Reply-To: "user@hive.apache.org" 
> Date: Friday, September 18, 2015 at 10:54
> To: "user@hive.apache.org" 
> Subject: ORA-8177 with Hive transactions
>
> While troubleshooting an issue with transactions shortly after enabling
> them, I noticed the following in an Oracle trace, which is our metastore
> for hive...
>
> ORA-8177: can't serialize access for this transaction
>
> These were thrown on "insert into HIVE_LOCKS..."
>
> Traditionally in Oracle, if an application actually needs serializable
> transactions, the fix is to to set initrans and maxtrans to the number of
> concurrent writers.  When I ran what is below on a table similar to
> HIVE_LOCKS, this exception was thrown everywhere.  The fix is to recreate
> the table with higher values for initrans (only 1 is the default for
> initrans, and 255 is the default for maxtrans).  When I did this and re-ran
> what is below, the exceptions were no longer thrown.
>
> Does anyone have any feedback on this performance hint?  The exceptions in
> hive are thrown from the checkRetryable method in the TxnHandler class, but
> I couldn't find what class.method throws them.  Perhaps the exceptions are
> not impactful, but given the fact the method expects them as it checks for
> the string in the exception message, I thought I would ask for feedback
> before we recreate the HIVE_LOCKS table with a higher value for INITRANS.
>
> import java.sql.*;public class testLock implements Runnable {
>   public static void main (String[] args) throws Exception {
> Class.forName("oracle.jdbc.driver.OracleDriver");
> for (int i = 1; i <= 100; i++) {
>   testLock tl = new testLock();
> }
>   }
>
>   public testLock() {
> Thread t = new Thread(this);
> t.start();
>   }
>
>   public void run() {
> try {
>   Connection conn = 
> DriverManager.getConnection("jdbc:oracle:thin:username/pwd@dbhost:1521/dbservice");
>   conn.createStatement().execute("alter session set isolation_level = 
> serializable");
>   PreparedStatement pst = conn.prepareStatement("update test set a = ?");
>   for (int j = 1; j <= 1; j++) {
> pst.setInt(1,j);
> pst.execute();
> conn.commit();
> System.out.println("worked");
>   }
> }
> catch (Exception e) {
>   System.out.println(e.getMessage());
> }
>   }}
>
>


ORA-8177 with Hive transactions

2015-09-18 Thread Steve Howard
While troubleshooting an issue with transactions shortly after enabling
them, I noticed the following in an Oracle trace, which is our metastore
for hive...

ORA-8177: can't serialize access for this transaction

These were thrown on "insert into HIVE_LOCKS..."

Traditionally in Oracle, if an application actually needs serializable
transactions, the fix is to to set initrans and maxtrans to the number of
concurrent writers.  When I ran what is below on a table similar to
HIVE_LOCKS, this exception was thrown everywhere.  The fix is to recreate
the table with higher values for initrans (only 1 is the default for
initrans, and 255 is the default for maxtrans).  When I did this and re-ran
what is below, the exceptions were no longer thrown.

Does anyone have any feedback on this performance hint?  The exceptions in
hive are thrown from the checkRetryable method in the TxnHandler class, but
I couldn't find what class.method throws them.  Perhaps the exceptions are
not impactful, but given the fact the method expects them as it checks for
the string in the exception message, I thought I would ask for feedback
before we recreate the HIVE_LOCKS table with a higher value for INITRANS.

import java.sql.*;
public class testLock implements Runnable {
  public static void main (String[] args) throws Exception {
Class.forName("oracle.jdbc.driver.OracleDriver");
for (int i = 1; i <= 100; i++) {
  testLock tl = new testLock();
}
  }

  public testLock() {
Thread t = new Thread(this);
t.start();
  }

  public void run() {
try {
  Connection conn =
DriverManager.getConnection("jdbc:oracle:thin:username/pwd@dbhost:1521/dbservice");
  conn.createStatement().execute("alter session set
isolation_level = serializable");
  PreparedStatement pst = conn.prepareStatement("update test set a = ?");
  for (int j = 1; j <= 1; j++) {
pst.setInt(1,j);
pst.execute();
conn.commit();
System.out.println("worked");
  }
}
catch (Exception e) {
  System.out.println(e.getMessage());
}
  }}


mapjoin with left join

2015-09-11 Thread Steve Howard
We would like to utilize mapjoin for the following SQL construct:

select small.* from small s left join large l on s.id = l.id where l.id is
null;

We can easily fit small into RAM, but large is over 1TB according to
optimizer stats. Unless we set
hive.auto.convert.join.noconditionaltask.size = to at least the size of
"large", the optimizer falls back to a common map join, which is incredibly
slow.

Given the fact it is a left join, which means we won't always have rows in
large for each row in small, is this behavior expected? Could it be that
reading the large table would miss the new rows in small, so the large one
has to be the one that is probed for matches?

We simply want to load the 81K rows in to RAM, then for each row in large,
check the small hash table and if it the row in small is not in large, then
add it to large.

Again, the optimizer will use a mapjoin if we set
hive.auto.convert.join.noconditionaltask.size = 1TB (the size of the large
table). This is of course, not practical. The small table is only 50MB.

At the link below is the entire test case with two tables, one of which has
three rows and other has 96. We can duplicate it with tables this small,
which leads me to believe I am missing something, or this is a bug.

The link has the source code that shows each table create, as well as the
explain with an argument for hive.auto.convert.join.noconditionaltask.size
that is passed at the command line. The output shows a mergejoin when the
hive.auto.convert.join.noconditionaltask.size size is less than 192 (the
size of the larger table), and a mapjoin when
hive.auto.convert.join.noconditionaltask.size is larger than 192 (large
table fits).

http://pastebin.com/Qg6hb8yV

The business case is loading only new rows into a large fact table.  The
new rows are the ones that are small in number.


Logging configuration

2015-09-08 Thread Steve Howard
We are having an issue for which we would like to enable logging.  The
component is
org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer.  We
have added entries to both hive-log4.properties
and hive-exec-log4j.properties as follows:

log4j.appender.ts=org.apache.log4j.FileAppender
log4j.appender.ts.file=/tmp/myhive.txt
log4j.appender.ts.layout = org.apache.log4j.SimpleLayout
log4j.logger.org.apache.hadoop.hive=INFO,ts

We see the component mentioned above in the stack trace for the exception
we are troubleshooting.  However, we see nothing in either the hive log or
our custom log appender.

We have tried naming the entire component, changing INFO to DEBUG, etc. but
no output for the LOG.info() line we see in the source code for the
component (which is in the method that is part of the stack trace).  We do
see other INFO type messages in the log that we don't normally see.  We did
restart hiveserver2 prior to running the problematic query.

How do you configure logging for a single component, or even package tree
like org.apache.hadoop.hive


Re: Permission denied error when starting HiveServer2

2015-09-07 Thread Steve Howard
strace is your friend if you are on Linux.  Try the following from the
shell in which you are starting hive...

strace -f -e trace=file service hive-server2 start 2>&1 | grep ermission

You should see the file it can't read/write.

On Mon, Sep 7, 2015 at 8:13 AM, Daniel Haviv <
daniel.ha...@veracity-group.com> wrote:

> Hi,
> I'm getting this error when starting HiveServer2:
> 2015-09-07 08:09:50,356 WARN org.apache.hive.service.server.HiveServer2:
> Error starting HiveServer2 on attempt 1, will retry in 60 seconds
> java.lang.RuntimeException: java.lang.RuntimeException:
> java.io.IOException: Permission denied
> at
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:472)
> at
> org.apache.hive.service.cli.CLIService.applyAuthorizationConfigPolicy(CLIService.java:124)
> at org.apache.hive.service.cli.CLIService.init(CLIService.java:111)
> at
> org.apache.hive.service.CompositeService.init(CompositeService.java:59)
> at
> org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:92)
> at
> org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:309)
> at
> org.apache.hive.service.server.HiveServer2.access$400(HiveServer2.java:68)
> at
> org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:523)
> at
> org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:396)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.lang.RuntimeException: java.io.IOException: Permission
> denied
> at
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:465)
> ... 14 more
> Caused by: java.io.IOException: Permission denied
> at java.io.UnixFileSystem.createFileExclusively(Native Method)
> at java.io.File.createTempFile(File.java:2024)
> at
> org.apache.hadoop.hive.ql.session.SessionState.createTempFile(SessionState.java:740)
> at
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:463)
> ... 14 more
>
> the hive user has write permissions to the scratch dir, is there another
> path I should take care of ?
>
> Thank you.
> Daniel
>


Simulating a login trigger in hive

2015-07-21 Thread Steve Howard
We would like to assign a YARN queue to a user upon login.

Is there any way to do this outside of the box?  If not, is anyone aware of
any development effort to do this?

It sounds like it would be pretty simple to extend the Connection class to
lookup a queue in a custom table in the metastore for a given user and set
tez.queue.name=foo.  If one isn't found, assign the default queue.


Re: hive on tez error

2015-06-15 Thread Steve Howard
What does netstat -anp | grep 9000 show?

On Mon, Jun 15, 2015 at 3:47 PM, Sateesh Karuturi <
sateesh.karutu...@gmail.com> wrote:

> iam using hive 1.0.0 and tez 0.5.2.. whenever iam trying to open the hive
> getting following error:
> Exception in thread "main" java.lang.RuntimeException:
> java.io.IOException: Previous writer likely failed to write
> hdfs://localhost:9000/tmp/hive/hadoop/_tez_session_dir/002dad89-59b6-43c9-92f9-1c7b2232b1c4/tez.
> Failing because I am unlikely to write too.
> at
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:457)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:626)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:570)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.io.IOException: Previous writer likely failed to write
> hdfs://localhost:9000/tmp/hive/hadoop/_tez_session_dir/002dad89-59b6-43c9-92f9-1c7b2232b1c4/tez.
> Failing because I am unlikely to write too.
> at
> org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeResource(DagUtils.java:979)
> at
> org.apache.hadoop.hive.ql.exec.tez.DagUtils.addTempResources(DagUtils.java:860)
> at
> org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeTempFilesFromConf(DagUtils.java:803)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.refreshLocalResourcesFromConf(TezSessionState.java:228)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:154)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:122)
> at
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:454)
> ... 8 more
>
>


Re: Create ORC Table on Tez Failed

2015-06-04 Thread Steve Howard
I would look in the hive server log, as well as the name node log.  That should 
have a full stack trace.


> On Jun 4, 2015, at 4:13 AM, "Vijay Bhoomireddy" 
>  wrote:
> 
> Hi,
>  
> We are trying to create a  Hive ORC Table  with Tez. DDL command for the same 
> is
>  
>  
> create table orc_table ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES 
> TERMINATED BY '\n'
> STORED AS ORC AS select * from select_table;
>  
> However, it fails with the following error message:
>  
> Moving data to: hdfs://:8020/apps/hive/warehouse/test
> Failed with exception Unable to rename: 
> hdfs://:8020/tmp/hive/hdfs/d240f669-b238-4697-9315-5f77eefff887/hive_2015-05-20_21-44-22_028_8798585586348522774-1/-ext-10001
>  to: hdfs://:8020/apps/hive/warehouse/test
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask
>  
> Can anyone please let me know what could be the likely cause of the issue and 
> possible way to get rid this error message. We are kind of stuck. Please let 
> us know your thoughts.
>  
> Thanks & Regards
> Vijay
> 
> The contents of this e-mail are confidential and for the exclusive use of the 
> intended recipient. If you receive this e-mail in error please delete it from 
> your system immediately and notify us either by e-mail or telephone. You 
> should not copy, forward or otherwise disclose the content of the e-mail. The 
> views expressed in this communication may not necessarily be the view held by 
> WHISHWORKS.


Re: Change hive column size

2015-05-28 Thread Steve Howard
Hi Udit,

That JIRA is five years old and applies to hive 0.5.  Newer releases are far 
larger...

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types

Thanks,

Steve

Sent from my iPad

> On May 28, 2015, at 9:08 PM, Udit Mehta  wrote:
> 
> Hi,
> 
> Per this ticket: https://issues.apache.org/jira/browse/HIVE-1364 , the max 
> column size in hive is limited to 4000 chars. But i do read that there is a 
> way to increase it via mysql which is our database for the metastore. Can 
> anyone point me as to how I can do this?
> 
> Our columns have deeply nested structs and easily cross the 4000 chars limit.
> 
> Thanks,
> Udit


Re: Executing HQL files from JAVA application.

2015-03-25 Thread Steve Howard
I would argue that executing arbitrary code from a random remote server has 
just increased your security scope footprint in terms of the need to control 
another access point.

Purely out of curiosity, is there a compelling architectural reason or 
environment limitation that results in your need to do it this way?

Sent from my iPad

> On Mar 25, 2015, at 9:57 AM, Amal Gupta  wrote:
> 
> Hi Gopal,
> 
> Thanks a lot. 
> Connectivity to the HiveServer2 was not an issue. We were able to connect 
> using the example that you shared and using Beeline. The issue is a script 
> execution from java app. May be I missed something, but I was not able to 
> find an efficient and elegant way to execute hive scripts placed at a 
> specific location from the java app.  
> 
> The scenario is 
> App Placed at a location A should be able to connect to hiveServer2 at B and 
> execute script TestHive.hql placed at a location on B(say 
> root/testProject/hive/scripts/TestHive.hql).
> 
> Regards,
> Amal
> 
> -Original Message-
> From: Gopal Vijayaraghavan [mailto:go...@hortonworks.com] On Behalf Of Gopal 
> Vijayaraghavan
> Sent: Wednesday, March 25, 2015 8:49 AM
> To: user@hive.apache.org
> Cc: Amal Gupta
> Subject: Re: Executing HQL files from JAVA application.
> 
> Hi,
> 
> Any mechanism which bypasses schema layers for SQL is a bad idea.
> 
> See this example for how you can connect to HiveServer2 directly from Java
> -
> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveSe
> rver2Clients-JDBCClientSampleCode
> 
> Use the JDBC driver to access HiveServer2 through a first-class Java API.
> 
> If you find any performance issues with this method, let me know and I can 
> fix it.
> 
> Cheers,
> Gopal
> 
> From:  Amal Gupta 
> Reply-To:  "user@hive.apache.org" 
> Date:  Sunday, March 22, 2015 at 10:53 PM
> To:  "user@hive.apache.org" 
> Subject:  RE: Executing HQL files from JAVA application.
> 
> 
> Hey Mich,
> 
> 
> Got any clues regarding the failure of the code that I sent?
> 
> I was going through the project and the code again and I suspect the 
> mis-matching dependencies to be the culprits. I am currently trying to 
> re-align the dependencies  as per the pom given on the mvnrepository.com 
> while trying to see if a particular configuration succeeds.
> 
> Will keep you posted on my progress.  Thanks again for all the help that you 
> are providing.
> J
> 
> Regards,
> Amal
> 
> 
> From: Amal Gupta
> 
> Sent: Sunday, March 22, 2015 7:52 AM
> To: user@hive.apache.org
> Subject: RE: Executing HQL files from JAVA application.
> 
> 
> 
> Hi Mich,
> 
> J A coincidence. Even I am new to hive. My test script which I am trying to 
> execute  contains a drop and a create statement.
> 
> Script :-
> 
> use test_db;
> DROP TABLE IF EXISTS demoHiveTable;
> CREATE EXTERNAL TABLE demoHiveTable (
> demoId string,
> demoName string
> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
> 
> STORED AS TEXTFILE LOCATION '/hive/';
> 
> 
> Java Code: -
> Not sure whether this will have an impact but the the code is a part of 
> Spring batch Tasklet being triggered from the Batch-Context. This tasklet 
> runs in parallel with other tasklets.
> 
> 
> public RepeatStatus execute(StepContribution arg0, ChunkContext arg1)
> 
> throws Exception {
>String[] args =
> {"-d",BeeLine.BEELINE_DEFAULT_JDBC_DRIVER,"-u","jdbc:hive2://:
> 1/test_db",
> 
> "-n","**","-p","**",
> "-f","C://Work//test_hive.hql"};
> 
>BeeLine
> beeline = new BeeLine();
>ByteArrayOutputStream os = new ByteArrayOutputStream();
>PrintStream beelineOutputStream = new PrintStream(os);
>beeline.setOutputStream(beelineOutputStream);
>beeline.setErrorStream(beelineOutputStream);
>beeline.begin(args,null);
>String output = os.toString("UTF8");
>System.out.println(output);
> 
> return RepeatStatus.FINISHED;
>   }
> 
> It will be great if you can share the piece of code that worked for you.
> May be it will give me some pointers on how to go ahead.
> 
> 
> Best Regards,
> Amal
> 
> 
> From: Mich Talebzadeh [mailto:m...@peridale.co.uk]
> 
> Sent: Sunday, March 22, 2015 2:58 AM
> To: user@hive.apache.org
> Subject: RE: Executing HQL files from JAVA application.
> 
> 
> 
> Hi Amal;
> 
> Me coming from relational database (Oracle, Sybase) background J always 
> expect that a DDL statement like DROP TABLE has to run in its own transaction 
> and cannot be  combined with a DML statement.
> 
> Now I suspect that when you run the command DROP TABLE IF EXIASTS 
> ;  like below in beehive it works
> 
> 0: jdbc:hive2://rhes564:10010/default> drop table if exists mytest; No rows 
> affected (0.216 seconds)
> 
> That runs in its own transaction so it works. However, I suspect in JAVA that 
> is not the case. Can you possibly provide your JAVA code to see 

Re: Delete ORC partition

2015-03-24 Thread Steve Howard
What happens if you try it with the actual colon rather than the encoded
value?

 alter table my_tbl drop partition (date='2014-01-02 00:00:00.0') ;

On Tue, Mar 24, 2015 at 12:37 AM, Megha Garg  wrote:

> Thanks for correcting, that was a typo. My actual command is :-
>
>  alter table my_tbl drop partition (date='2014-01-02 00%3A00%3A00.0') ;
>
> On Tue, Mar 24, 2015 at 9:35 AM, Steve Howard 
> wrote:
>
>> Do you have a typo in the partition name?  There is a space in the list
>> you have between day and hour, but not in your drop statement.  Also %3A is
>> hex for the ":" character, but you don't have that in you partition name to
>> get dropped.
>>
>> Sent from my iPad
>>
>> On Mar 23, 2015, at 11:46 PM, Megha Garg  wrote:
>>
>> I am not getting any error and my hive version is 0.13
>>
>> On Mon, Mar 23, 2015 at 8:57 PM, Alan Gates  wrote:
>>
>>> Are you getting an error or does the partition just not get deleted?  If
>>> you get an error message can you share it?  What version of Hive are you
>>> using?
>>>
>>> Alan.
>>>
>>>  
>>>  Megha Garg 
>>>  March 23, 2015 at 5:43
>>> Hi,
>>>
>>> I am new to hive. I have created one ORC table with partitioning where
>>> my partition looks like below:-
>>>
>>> *date=2014-01-01 00%3A00%3A00.0*
>>> *date=2014-01-02 00%3A00%3A00.0*
>>> *date=2014-01-03 00%3A00%3A00.0*
>>>
>>> I want to delete my second partition (date=2014-01-02 00%3A00%3A00.0)
>>> but i am not able to do so. I am using the below query:-
>>>
>>> * alter table my_tbl drop  partition (date='2014-01-0200.00.00.0') ;*
>>>
>>> But it is not working.  How can i delete it?
>>>
>>>
>>
>


Re: Delete ORC partition

2015-03-23 Thread Steve Howard
Do you have a typo in the partition name?  There is a space in the list you 
have between day and hour, but not in your drop statement.  Also %3A is hex for 
the ":" character, but you don't have that in you partition name to get dropped.

Sent from my iPad

> On Mar 23, 2015, at 11:46 PM, Megha Garg  wrote:
> 
> I am not getting any error and my hive version is 0.13
> 
>> On Mon, Mar 23, 2015 at 8:57 PM, Alan Gates  wrote:
>> Are you getting an error or does the partition just not get deleted?  If you 
>> get an error message can you share it?  What version of Hive are you using?
>> 
>> Alan.
>> 
>>>Megha Garg  March 23, 2015 at 5:43
>>> Hi,
>>> 
>>> I am new to hive. I have created one ORC table with partitioning where my 
>>> partition looks like below:-
>>> 
>>> date=2014-01-01 00%3A00%3A00.0
>>> date=2014-01-02 00%3A00%3A00.0
>>> date=2014-01-03 00%3A00%3A00.0
>>> 
>>> I want to delete my second partition (date=2014-01-02 00%3A00%3A00.0) but i 
>>> am not able to do so. I am using the below query:-
>>> 
>>>  alter table my_tbl drop  partition (date='2014-01-0200.00.00.0') ; 
>>> 
>>> But it is not working.  How can i delete it?
> 


Re: Executing HQL files from JAVA application.

2015-03-21 Thread Steve Howard
There are more elegant ways I am sure, but you could also use a 
java.io.BufferedReader and read the file content into a string and execute it 
much as you would a hard coded SQL statement in your class.

Sent from my iPad

> On Mar 21, 2015, at 5:04 AM, Amal Gupta  wrote:
> 
> Hi Everyone,
>  
> I am trying to execute a hive *.hql file from a java application.  I had 
> tried a couple of ways of doing it through JDBC driver for hive and through 
> spring jdbc template but yet, the only way which was successful for me was to 
> create a runtime process and then execute it.
>  
> The java code to do that is  
>  
> Process p = Runtime.getRuntime().exec((new 
> String[]{"hive","-f","/hive/scripts/demoHiveTable.hql"}));
>  
> Although this works but, I wanted to know if there is better way to do it 
> using some driver functionality or any other api for hive. The method I used 
> will do the execution for me, but yet, in case of failures while executing 
> the hql script, the same will not be visible to the Java application.
>  
> Any pointers or suggestions will be greatly helpful. Thanks in advance.
>  
> Please let me know in case you need any other details from me.
>  
> Regards,
> Amal
> American Express made the following annotations 
> 
> "This message and any attachments are solely for the intended recipient and 
> may contain confidential or privileged information. If you are not the 
> intended recipient, any disclosure, copying, use, or distribution of the 
> information included in this message and any attachments is prohibited. If 
> you have received this communication in error, please notify us by reply 
> e-mail and immediately and permanently delete this message and any 
> attachments. Thank you." 
> 
> American Express a ajouté le commentaire suivant le 
> Ce courrier et toute pièce jointe qu'il contient sont réservés au seul 
> destinataire indiqué et peuvent renfermer des renseignements confidentiels et 
> privilégiés. Si vous n'êtes pas le destinataire prévu, toute divulgation, 
> duplication, utilisation ou distribution du courrier ou de toute pièce jointe 
> est interdite. Si vous avez reçu cette communication par erreur, veuillez 
> nous en aviser par courrier et détruire immédiatement le courrier et les 
> pièces jointes. Merci. 


sqoop import to hive being killed by resource manager

2015-03-12 Thread Steve Howard
Hi All,

We have not been able to get what is in the subject line to run.  This is
on hive 0.14.  While pulling a billion row table from Oracle using 12
splits on the primary key, each job continually runs out of memory such as
below...

15/03/13 00:22:23 INFO mapreduce.Job: Task Id :
attempt_1426097251374_0011_m_11_0, Status : FAILED
Container [pid=27919,containerID=container_1426097251374_0011_01_13] is
running beyond physical memory limits. Current usage: 513.5 MB of 512 MB
physical memory used; 879.3 MB of 1.0 GB virtual memory used. Killing
container.
Dump of the process-tree for container_1426097251374_0011_01_13 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 28078 27919 27919 27919 (java) 63513 834 912551936 131129
/usr/jdk64/jdk1.7.0_45/bin/java -server -XX:NewRatio=8
-Djava.net.preferIPv4Stack=true -Dhdp.version=2.2.0.0-2041 -Xmx410m
-Djava.io.tmpdir=/mnt/hdfs/hadoop/yarn/local/usercache/hdfs/appcache/application_1426097251374_0011/container_1426097251374_0011_01_13/tmp
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.container.log.dir=/mnt/hdfs/hadoop/yarn/log/application_1426097251374_0011/container_1426097251374_0011_01_13
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
org.apache.hadoop.mapred.YarnChild 172.27.2.57 52335
attempt_1426097251374_0011_m_11_0 13
|- 27919 27917 27919 27919 (bash) 1 2 9424896 317 /bin/bash -c
/usr/jdk64/jdk1.7.0_45/bin/java -server -XX:NewRatio=8
-Djava.net.preferIPv4Stack=true -Dhdp.version=2.2.0.0-2041 -Xmx410m
-Djava.io.tmpdir=/mnt/hdfs/hadoop/yarn/local/usercache/hdfs/appcache/application_1426097251374_0011/container_1426097251374_0011_01_13/tmp
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.container.log.dir=/mnt/hdfs/hadoop/yarn/log/application_1426097251374_0011/container_1426097251374_0011_01_13
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
org.apache.hadoop.mapred.YarnChild 172.27.2.57 52335
attempt_1426097251374_0011_m_11_0 13
1>/mnt/hdfs/hadoop/yarn/log/application_1426097251374_0011/container_1426097251374_0011_01_13/stdout
2>/mnt/hdfs/hadoop/yarn/log/application_1426097251374_0011/container_1426097251374_0011_01_13/stderr

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

We have tried several different sizes for various switches, but the job
always fails.

Is this simply a function of the data, or is there another issue?

Thanks,

Steve