Re: Transaction deadlocks

2015-09-22 Thread Steve Howard
Am I missing it, or do you reset the deadlock count to 0 every time the
lock method is taken, so the ten count is never reached in checkRetryable()?

  public LockResponse lock(LockRequest rqst)
throws NoSuchTxnException, TxnAbortedException, MetaException
  {
>>>this.deadlockCnt = 0; <<<
try
{
  Connection dbConn = null;
  try
  {
dbConn = getDbConn(8);
return lock(dbConn, rqst, true);
  }
  catch (SQLException e)
  {
LOG.debug("Going to rollback");
rollbackDBConn(dbConn);
checkRetryable(dbConn, e, "lock(" + rqst + ")");
throw new MetaException("Unable to update transaction database " +
StringUtils.stringifyException(e));
  }
  finally
  {
closeDbConn(dbConn);
  }
  return lock(rqst);
}
catch (RetryException e) {}
  }


On Tue, Sep 22, 2015 at 9:24 PM, Steve Howard 
wrote:

> Hi Eugene,
>
> This is HDP 2.3, 1.2.1.2.3.0.0-2557.
>
> I see the deadlock count is 10, and we haven't overriden
> ConfVars.HMSHANDLERATTEMPTS, which also defaults to 10.  I am at a loss why
> the call stack would be that large (I think it is 1,000), as that indicates
> to me it keeps failing to get the lock and keeps calling itself.
>
>   protected void checkRetryable(Connection conn, SQLException e, String
> caller)
> throws TxnHandler.RetryException, MetaException
>   {
> if ((this.dbProduct == null) && (conn != null)) {
>   determineDatabaseProduct(conn);
> }
> if (((e instanceof SQLTransactionRollbackException)) ||
> (((this.dbProduct != DatabaseProduct.MYSQL) && (this.dbProduct !=
> DatabaseProduct.POSTGRES) && (this.dbProduct != DatabaseProduct.SQLSERVER))
> || ((e.getSQLState().equals("40001")) || ((this.dbProduct ==
> DatabaseProduct.POSTGRES) && (e.getSQLState().equals("40P01"))) ||
> ((this.dbProduct == DatabaseProduct.ORACLE) &&
> ((e.getMessage().contains("deadlock detected")) ||
> (e.getMessage().contains("can't serialize access for this
> transaction")))
> {
>   if (this.deadlockCnt++ < 10)
>   {
> long waitInterval = this.deadlockRetryInterval * this.deadlockCnt;
> LOG.warn("Deadlock detected in " + caller + ". Will wait " +
> waitInterval + "ms try again up to " + (10 - this.deadlockCnt + 1) + "
> times.");
> try
> {
>   Thread.sleep(waitInterval);
> }
> catch (InterruptedException ie) {}
> throw new RetryException();
>   }
>   LOG.error("Too many repeated deadlocks in " + caller + ", giving
> up.");
>   this.deadlockCnt = 0;
> }
> else if (isRetryable(e))
> {
>   if (this.retryNum++ < this.retryLimit)
>   {
> LOG.warn("Retryable error detected in " + caller + ".  Will wait "
> + this.retryInterval + "ms and retry up to " + (this.retryLimit -
> this.retryNum + 1) + " times.  Error: " + getMessage(e));
> try
> {
>   Thread.sleep(this.retryInterval);
> }
> catch (InterruptedException ex) {}
> throw new RetryException();
>   }
>   LOG.error("Fatal error. Retry limit (" + this.retryLimit + ")
> reached. Last error: " + getMessage(e));
>   this.retryNum = 0;
> }
> else
> {
>   this.deadlockCnt = 0;
>   this.retryNum = 0;
> }
>   }
>
> Thanks,
>
> Steve
>
> On Tue, Sep 22, 2015 at 6:22 PM, Eugene Koifman 
> wrote:
>
>> What version of Hive are you running?  In the current codebase at least,
>> TxnHandler.checkRetryable() keeps track of how many retries it allowed – it
>> will not retry forever.
>>
>>
>> From: Steve Howard 
>> Reply-To: "user@hive.apache.org" 
>> Date: Tuesday, September 22, 2015 at 1:32 PM
>> To: "user@hive.apache.org" 
>> Subject: Transaction deadlocks
>>
>>
>>
>> Thread A…
>>
>>
>>
>>
>>
>> "HiveServer2-Background-Pool: Thread-35" #35 prio=5 os_prio=0
>> tid=0x7fd150e4 nid=0x2c97 runnable [0x7fd146e0a000]
>>
>>java.lang.Thread.State: RUNNABLE
>>
>> at java.net.SocketInputStream.socketRead0(Native Method)
>>
>> at
>> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>>
>> at java.net.SocketInputStream.read(SocketInputStream.java:170)
>>
>> at java.net.SocketInputStream.read(SocketInputStream.java:141)
>>
>> at oracle.net.ns.Packet.receive(Packet.java:300)
>>
>> at oracle.net.ns.DataPacket.receive(DataPacket.java:106)
>>
>> at
>> oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:315)
>>
>> at oracle.net.ns.NetInputStream.read(NetInputStream.java:260)
>>
>> at oracle.net.ns.NetInputStream.read(NetInputStream.java:185)
>>
>> at oracle.net.ns.NetInputStream.read(NetInputStream.java:102)
>>
>> at
>> oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:124)
>>
>> at
>> oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:80)
>>
>> at
>> oracle.jdbc.dr

Re: Transaction deadlocks

2015-09-22 Thread Steve Howard
Hi Eugene,

This is HDP 2.3, 1.2.1.2.3.0.0-2557.

I see the deadlock count is 10, and we haven't overriden
ConfVars.HMSHANDLERATTEMPTS, which also defaults to 10.  I am at a loss why
the call stack would be that large (I think it is 1,000), as that indicates
to me it keeps failing to get the lock and keeps calling itself.

  protected void checkRetryable(Connection conn, SQLException e, String
caller)
throws TxnHandler.RetryException, MetaException
  {
if ((this.dbProduct == null) && (conn != null)) {
  determineDatabaseProduct(conn);
}
if (((e instanceof SQLTransactionRollbackException)) ||
(((this.dbProduct != DatabaseProduct.MYSQL) && (this.dbProduct !=
DatabaseProduct.POSTGRES) && (this.dbProduct != DatabaseProduct.SQLSERVER))
|| ((e.getSQLState().equals("40001")) || ((this.dbProduct ==
DatabaseProduct.POSTGRES) && (e.getSQLState().equals("40P01"))) ||
((this.dbProduct == DatabaseProduct.ORACLE) &&
((e.getMessage().contains("deadlock detected")) ||
(e.getMessage().contains("can't serialize access for this
transaction")))
{
  if (this.deadlockCnt++ < 10)
  {
long waitInterval = this.deadlockRetryInterval * this.deadlockCnt;
LOG.warn("Deadlock detected in " + caller + ". Will wait " +
waitInterval + "ms try again up to " + (10 - this.deadlockCnt + 1) + "
times.");
try
{
  Thread.sleep(waitInterval);
}
catch (InterruptedException ie) {}
throw new RetryException();
  }
  LOG.error("Too many repeated deadlocks in " + caller + ", giving
up.");
  this.deadlockCnt = 0;
}
else if (isRetryable(e))
{
  if (this.retryNum++ < this.retryLimit)
  {
LOG.warn("Retryable error detected in " + caller + ".  Will wait "
+ this.retryInterval + "ms and retry up to " + (this.retryLimit -
this.retryNum + 1) + " times.  Error: " + getMessage(e));
try
{
  Thread.sleep(this.retryInterval);
}
catch (InterruptedException ex) {}
throw new RetryException();
  }
  LOG.error("Fatal error. Retry limit (" + this.retryLimit + ")
reached. Last error: " + getMessage(e));
  this.retryNum = 0;
}
else
{
  this.deadlockCnt = 0;
  this.retryNum = 0;
}
  }

Thanks,

Steve

On Tue, Sep 22, 2015 at 6:22 PM, Eugene Koifman 
wrote:

> What version of Hive are you running?  In the current codebase at least,
> TxnHandler.checkRetryable() keeps track of how many retries it allowed – it
> will not retry forever.
>
>
> From: Steve Howard 
> Reply-To: "user@hive.apache.org" 
> Date: Tuesday, September 22, 2015 at 1:32 PM
> To: "user@hive.apache.org" 
> Subject: Transaction deadlocks
>
>
>
> Thread A…
>
>
>
>
>
> "HiveServer2-Background-Pool: Thread-35" #35 prio=5 os_prio=0
> tid=0x7fd150e4 nid=0x2c97 runnable [0x7fd146e0a000]
>
>java.lang.Thread.State: RUNNABLE
>
> at java.net.SocketInputStream.socketRead0(Native Method)
>
> at
> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>
> at java.net.SocketInputStream.read(SocketInputStream.java:170)
>
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
>
> at oracle.net.ns.Packet.receive(Packet.java:300)
>
> at oracle.net.ns.DataPacket.receive(DataPacket.java:106)
>
> at
> oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:315)
>
> at oracle.net.ns.NetInputStream.read(NetInputStream.java:260)
>
> at oracle.net.ns.NetInputStream.read(NetInputStream.java:185)
>
> at oracle.net.ns.NetInputStream.read(NetInputStream.java:102)
>
> at
> oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:124)
>
> at
> oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:80)
>
> at
> oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1137)
>
> at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:290)
>
> at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
>
> at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)
>
> at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:193)
>
> at
> oracle.jdbc.driver.T4CStatement.executeForRows(T4CStatement.java:1033)
>
> at
> oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1329)
>
> at
> oracle.jdbc.driver.OracleStatement.executeUpdateInternal(OracleStatement.java:1838)
>
> at
> oracle.jdbc.driver.OracleStatement.executeUpdate(OracleStatement.java:1803)
>
> - locked <0xc09fcda0> (a oracle.jdbc.driver.T4CConnection)
>
> at
> oracle.jdbc.driver.OracleStatementWrapper.executeUpdate(OracleStatementWrapper.java:294)
>
> at
> org.apache.commons.dbcp.DelegatingStatement.executeUpdate(DelegatingStatement.java:228)
>
> at
> org.apache.commons.dbcp.DelegatingStatement.executeUpdate(DelegatingStatem

Re: Transaction deadlocks

2015-09-22 Thread Eugene Koifman
What version of Hive are you running?  In the current codebase at least, 
TxnHandler.checkRetryable() keeps track of how many retries it allowed – it 
will not retry forever.


From: Steve Howard mailto:stevedhow...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Tuesday, September 22, 2015 at 1:32 PM
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Transaction deadlocks


Thread A…


"HiveServer2-Background-Pool: Thread-35" #35 prio=5 os_prio=0 
tid=0x7fd150e4 nid=0x2c97 runnable [0x7fd146e0a000]
   java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at oracle.net.ns.Packet.receive(Packet.java:300)
at oracle.net.ns.DataPacket.receive(DataPacket.java:106)
at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:315)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:260)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:185)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:102)
at 
oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:124)
at 
oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:80)
at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1137)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:290)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)
at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:193)
at 
oracle.jdbc.driver.T4CStatement.executeForRows(T4CStatement.java:1033)
at 
oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1329)
at 
oracle.jdbc.driver.OracleStatement.executeUpdateInternal(OracleStatement.java:1838)
at 
oracle.jdbc.driver.OracleStatement.executeUpdate(OracleStatement.java:1803)
- locked <0xc09fcda0> (a oracle.jdbc.driver.T4CConnection)
at 
oracle.jdbc.driver.OracleStatementWrapper.executeUpdate(OracleStatementWrapper.java:294)
at 
org.apache.commons.dbcp.DelegatingStatement.executeUpdate(DelegatingStatement.java:228)
at 
org.apache.commons.dbcp.DelegatingStatement.executeUpdate(DelegatingStatement.java:228)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:1432)
- locked <0xc09fcc28> (a java.lang.Object)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:422)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)



…blocks Thread B, resulting in a deadlock message being written to the logs 
over and over again…




"HiveServer2-Background-Pool: Thread-51" #51 prio=5 os_prio=0 
tid=0x0279f000 nid=0x7227 waiting for monitor entry [0x7fd146a47000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:1361)
- waiting to lock <0xc09fcc28> (a java.lang.Object)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:422)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)



…but my guess is the bigger issue is that eventually the stack will be 
exhausted, as the method below recursively calls itself (notice the stack 
above)…


  public LockResponse lock

Re: mapjoin with left join

2015-09-22 Thread Steve Howard
Hi Gopal/All,

Yep, I absolutely understand the limitation of what we are trying to do.
We will try the settings you suggested.

Thanks,

Steve

On Tue, Sep 22, 2015 at 1:44 PM, Gopal Vijayaraghavan 
wrote:

>
> > select small.* from small s left join large l on s.id  =
> >l.id  where l.id  is null;
> ...
> > We simply want to load the 81K rows in to RAM, then for each row in
> >large, check the small hash table and if it the row in small is not in
> >large, then add it to large.
>
> That seems like a fair description of the problem with the approach, but
> it is fairly hard to do left outers in a distributed fashion with
> correctness.
>
> Imagine I load partition #1 from the big table + the small table into a
> task - how would I know partiton #2 doesn't have a key from the small
> hashtable. So the system cannot output any rows in the naive version of
> this problem until it compares the small table to all parts of the big
> table.
>
> That I think was Sergey's point & I don't know of an easy way around that
> yet.
>
> > The business case is loading only new rows into a large fact table.  The
> >new rows are the ones that are small in number.
>
> That however is much easier - because that's better written as.
>
> insert into large select * from small where id NOT IN (select id from
> large);
>
> FYI, we want people to stop writing these sort of queries once the MERGE
> syntax lands in HIVE - https://issues.apache.org/jira/browse/HIVE-10924
>
>
> The trick is to rewrite the NOT IN condition as two set operations - inner
> join + remainder left outer.
>
> Find all ids which already exist in bigtable with an inner map-join.
> Remove all those ids from the small table & insert the remainder.
>
> explain rewrite select remainder.* from small remainder where id not in
> (select id from small, large where small.id = large.id);
>
> That query, you will find will run much faster than the query you're
> currently using.
>
> If you can, please send the "explain rewrite" & I can probably fine-tune
> this approach further.
>
> If you're using Tez, you might want to try out the custom vertex managers
> as well
>
> set hive.vectorized.execution.mapjoin.minmax.enabled=true;
>
> set hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled=true;
> set hive.optimize.dynamic.partition.hashjoin=true;
>
> Those params should allow you to scale up a map-join to ~100x the
> available RAM (aimed at LLAP memory utilization).
>
> Cheers,
> Gopal
>
>
>


Transaction deadlocks

2015-09-22 Thread Steve Howard
Thread A…





"HiveServer2-Background-Pool: Thread-35" #35 prio=5 os_prio=0
tid=0x7fd150e4 nid=0x2c97 runnable [0x7fd146e0a000]

   java.lang.Thread.State: RUNNABLE

at java.net.SocketInputStream.socketRead0(Native Method)

at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)

at java.net.SocketInputStream.read(SocketInputStream.java:170)

at java.net.SocketInputStream.read(SocketInputStream.java:141)

at oracle.net.ns.Packet.receive(Packet.java:300)

at oracle.net.ns.DataPacket.receive(DataPacket.java:106)

at
oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:315)

at oracle.net.ns.NetInputStream.read(NetInputStream.java:260)

at oracle.net.ns.NetInputStream.read(NetInputStream.java:185)

at oracle.net.ns.NetInputStream.read(NetInputStream.java:102)

at
oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:124)

at
oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:80)

at
oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1137)

at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:290)

at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)

at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)

at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:193)

at
oracle.jdbc.driver.T4CStatement.executeForRows(T4CStatement.java:1033)

at
oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1329)

at
oracle.jdbc.driver.OracleStatement.executeUpdateInternal(OracleStatement.java:1838)

at
oracle.jdbc.driver.OracleStatement.executeUpdate(OracleStatement.java:1803)

- locked <0xc09fcda0> (a oracle.jdbc.driver.T4CConnection)

at
oracle.jdbc.driver.OracleStatementWrapper.executeUpdate(OracleStatementWrapper.java:294)

at
org.apache.commons.dbcp.DelegatingStatement.executeUpdate(DelegatingStatement.java:228)

at
org.apache.commons.dbcp.DelegatingStatement.executeUpdate(DelegatingStatement.java:228)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:1432)

- locked <0xc09fcc28> (a java.lang.Object)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:422)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)





…blocks Thread B, resulting in a deadlock message being written to the logs
over and over again…









"HiveServer2-Background-Pool: Thread-51" #51 prio=5 os_prio=0
tid=0x0279f000 nid=0x7227 waiting for monitor entry
[0x7fd146a47000]

   java.lang.Thread.State: BLOCKED (on object monitor)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:1361)

- waiting to lock <0xc09fcc28> (a java.lang.Object)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:422)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)

at
org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:433)





…but my guess is the bigger issue is that eventually the stack will be
exhausted, as the method below recursively calls itself (notice the stack
above)…





  public LockResponse lock(LockRequest rqst)

throws NoSuchTxnException, TxnAbortedException, MetaException

  {

this.deadlockCnt = 0;

try

{

  Connection dbConn = null;

  try

  {

dbConn = getDbConn(8);

return lock(dbConn, rqst, true);

  }

  catch (SQLException e)

  {

LOG.debug("Going to rollback");

rollbackDBConn(dbConn);

checkRetryable(dbConn, e, "lock(" + rqst + ")");

throw 

Re: mapjoin with left join

2015-09-22 Thread Gopal Vijayaraghavan

> select small.* from small s left join large l on s.id  =
>l.id  where l.id  is null;
...
> We simply want to load the 81K rows in to RAM, then for each row in
>large, check the small hash table and if it the row in small is not in
>large, then add it to large.

That seems like a fair description of the problem with the approach, but
it is fairly hard to do left outers in a distributed fashion with
correctness.

Imagine I load partition #1 from the big table + the small table into a
task - how would I know partiton #2 doesn't have a key from the small
hashtable. So the system cannot output any rows in the naive version of
this problem until it compares the small table to all parts of the big
table.

That I think was Sergey's point & I don't know of an easy way around that
yet.

> The business case is loading only new rows into a large fact table.  The
>new rows are the ones that are small in number.

That however is much easier - because that's better written as.

insert into large select * from small where id NOT IN (select id from
large);

FYI, we want people to stop writing these sort of queries once the MERGE
syntax lands in HIVE - https://issues.apache.org/jira/browse/HIVE-10924


The trick is to rewrite the NOT IN condition as two set operations - inner
join + remainder left outer.

Find all ids which already exist in bigtable with an inner map-join.
Remove all those ids from the small table & insert the remainder.

explain rewrite select remainder.* from small remainder where id not in
(select id from small, large where small.id = large.id);

That query, you will find will run much faster than the query you're
currently using. 

If you can, please send the "explain rewrite" & I can probably fine-tune
this approach further.

If you're using Tez, you might want to try out the custom vertex managers
as well

set hive.vectorized.execution.mapjoin.minmax.enabled=true;

set hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled=true;
set hive.optimize.dynamic.partition.hashjoin=true;

Those params should allow you to scale up a map-join to ~100x the
available RAM (aimed at LLAP memory utilization).
 
Cheers,
Gopal




Error while executing hive Query : org.apache.hadoop.hive.serde2.io.HiveDecimalWritable cannot be cast to org.apache.hadoop.io.IntWritable

2015-09-22 Thread @Sanjiv Singh
Hi Folks,

I am running query on hive seems from logs , it is failing because of
erroneous record in table [*date_dim*] causing casting exception when I
looked for other records in [*date_dim*], nothing different in particular
record(see sample records at last)

Can you please help me understand why error for this particular record ?
how it can be resolved ?
Any help is highly appreciated !!!



*Hive query : *



*SELECT 'store' AS channel, 'ss_cdemo_sk' AS col_name, d_year, d_qoy,
i_category, ss_ext_sales_price AS ext_sales_price  FROM  store_sales ,
item ,  date_dim WHERE ss_cdemo_sk IS NULL  AND ss_sold_date_sk =
d_date_sk  AND ss_item_sk = i_item_sk ;*


*Hive logs : *

Query ID = root_20150922151717_a94d4679-224b-41f3-8336-a799d4ebedab
Total jobs = 4



Launching Job 3 out of 4
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1441794795162_13426, Tracking URL =
http://IMPETUS-DSRV03.impetus.co.in:8035/proxy/application_1441794795162_13426/
Kill Command = /opt/hes/hadoop/hadoop-2.6.0//bin/hadoop job  -kill
job_1441794795162_13426
Hadoop job information for Stage-6: number of mappers: 1; number of
reducers: 0
2015-09-22 15:18:34,291 Stage-6 map = 0%,  reduce = 0%
2015-09-22 15:18:54,541 Stage-6 map = 100%,  reduce = 0%
Ended Job = job_1441794795162_13426 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1441794795162_13426_m_00 (and more) from job
job_1441794795162_13426

Task with the most failures(4):
-
Task ID:
  task_1441794795162_13426_m_00

URL:

http://IMPETUS-DSRV03.impetus.co.in:8088/taskdetails.jsp?jobid=job_1441794795162_13426&tipid=task_1441794795162_13426_m_00
-
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row
{"d_date_sk":2452538,"d_date_id":"KDMGFCAA","d_date":"2002-09-20","d_month_seq":1232,"d_week_seq":5360,"d_quarter_seq":412,"d_year":2002,"d_dow":5,"d_moy":9,"d_dom":20,"d_qoy":3,"d_fy_year":2002,"d_fy_quarter_seq":412,"d_fy_week_seq":5360,"d_day_name":"Friday
","d_quarter_name":"N","d_holiday":"2002Q3","d_weekend":"N","d_following_holiday":"Y","d_first_dom":2452761,"d_last_dom":2452519,"d_same_day_ly":2452447,"d_same_day_lq":2452173,"d_current_day":"N","d_current_week":"N","d_current_month":"N","d_current_quarter":"N","d_current_year":"N"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing row
{"d_date_sk":2452538,"d_date_id":"KDMGFCAA","d_date":"2002-09-20","d_month_seq":1232,"d_week_seq":5360,"d_quarter_seq":412,"d_year":2002,"d_dow":5,"d_moy":9,"d_dom":20,"d_qoy":3,"d_fy_year":2002,"d_fy_quarter_seq":412,"d_fy_week_seq":5360,"d_day_name":"Friday
","d_quarter_name":"N","d_holiday":"2002Q3","d_weekend":"N","d_following_holiday":"Y","d_first_dom":2452761,"d_last_dom":2452519,"d_same_day_ly":2452447,"d_same_day_lq":2452173,"d_current_day":"N","d_current_week":"N","d_current_month":"N","d_current_quarter":"N","d_current_year":"N"}
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected
exception: org.apache.hadoop.hive.serde2.io.HiveDecimalWritable cannot be
cast to org.apache.hadoop.io.IntWritable
at
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:311)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:120)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 9 more
Caused by: java.lang.ClassCastException:
org.apache.hadoop.hive.serde2.io.HiveDecimalWritable cannot be cast to
org.apache.hadoop.io.IntWritable
at
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:36)
at
org.apache.hadoop.hive.ql.udf.gen

hive ORC wrong number of index entries error

2015-09-22 Thread Patrick Duin
Hi all,

I am struggling trying to understand a stack trace I am getting trying to
write an ORC file:
I am using hive-0.13.0/hadoop-2.4.0.

2015-09-21 09:15:44,603 INFO [main] org.apache.hadoop.mapred.MapTask:
Ignoring exception during close for
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector@2ce49e21
java.lang.IllegalArgumentException: Column has wrong number of index
entries found: 
org.apache.hadoop.hive.ql.io.orc.OrcProto$RowIndexEntry$Builder@6eeb967b
expected: 1
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:578)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1398)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1780)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
at 
org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.close(OrcNewOutputFormat.java:67)
at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
2015-09-21 09:15:45,988 FATAL [main]
org.apache.hadoop.mapred.YarnChild: Error running child :
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at 
org.apache.hadoop.hive.ql.io.orc.OutStream.getNewOutputBuffer(OutStream.java:117)
at org.apache.hadoop.hive.ql.io.orc.OutStream.spill(OutStream.java:168)
at org.apache.hadoop.hive.ql.io.orc.OutStream.flush(OutStream.java:239)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:583)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.writeStripe(WriterImpl.java:1012)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1400)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1780)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
at 
org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.close(OrcNewOutputFormat.java:67)
at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

I've seen https://issues.apache.org/jira/browse/HIVE-9080 and I think
that might be related.

I am not using hive though I am using a Map only job that writes to an
OrcNewOutputFormat.class.

Any pointers would be appreciated, anyone seen this before?


Thanks,

 Patrick


Re: Hive Query failing !!!

2015-09-22 Thread Nitin Pawar
Ok Sorry my bad
I had overlooked your query that you are doing joins via where clause.


On Tue, Sep 22, 2015 at 12:20 PM, @Sanjiv Singh 
wrote:

> Nitin,
>
> Following setting already there at HIVE.
> set hive.exec.mode.local.auto=false;
>
> Surprisingly , when it did following setting , it started working 
> set hive.auto.convert.join=true;
>
> can you please help me understand , what had happened ?
>
>
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>
> On Tue, Sep 22, 2015 at 11:41 AM, Nitin Pawar 
> wrote:
>
>> Can you try setting these
>> set hive.exec.mode.local.auto=false;
>>
>>
>> On Tue, Sep 22, 2015 at 11:25 AM, @Sanjiv Singh 
>> wrote:
>>
>>>
>>>
>>> *Hi Folks,*
>>>
>>>
>>> *I am running given hive query . it is giving error while executing.
>>> please help me get out of it and understand possible reason for error.*
>>>
>>> *Hive Query :*
>>>
>>> SELECT *
>>> FROM  store_sales ,  date_dim ,  store ,
>>> household_demographics ,  customer_address
>>> WHERE store_sales.ss_sold_date_sk = date_dim.d_date_sk AND
>>> store_sales.ss_store_sk = store.s_store_sk
>>> AND store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk  AND
>>> store_sales.ss_addr_sk = customer_address.ca_address_sk
>>> AND  ( date_dim.d_dom BETWEEN 1 AND 2 )
>>> AND (household_demographics.hd_dep_count = 3 OR
>>> household_demographics.hd_vehicle_count = -1 )
>>> AND date_dim.d_year  IN (1998, 1998 + 1 , 1998 + 2 )  AND store.s_city
>>> IN ('Midway','Fairview')  ;
>>>
>>>
>>> *Note : *
>>> All tables [store_sales ,  date_dim ,  store ,
>>> household_demographics ,  customer_address] are in ORC format.
>>> hive version  : 1.0.0
>>>
>>>
>>> *Additional note :*
>>> I also checked hive EXPLAIN for same query . It is failing at last stage
>>> where is joining intermediate result to customer_address.
>>> I also checked for null values on store_sales.ss_addr_sk ,
>>> customer_address.ca_address_sk. which is not the case.
>>> I also changed hive log level to DEBUG , not specific in log file
>>> regarding error.
>>>
>>> I really wanted to understand why hive query is failing.
>>> and how can be resolved ?
>>> and where to look into ?
>>> any help is highly appreciated.
>>>
>>>
>>> *At Hive console :*
>>>
>>> Launching Job 4 out of 4
>>> Number of reduce tasks not specified. Estimated from input data size: 1
>>> In order to change the average load for a reducer (in bytes):
>>>   set hive.exec.reducers.bytes.per.reducer=
>>> In order to limit the maximum number of reducers:
>>>   set hive.exec.reducers.max=
>>> In order to set a constant number of reducers:
>>>   set mapreduce.job.reduces=
>>> java.lang.NullPointerException
>>> at
>>> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:265)
>>> at
>>> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:272)
>>> at
>>> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:509)
>>> ...
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>>> Job Submission failed with exception
>>> 'java.lang.NullPointerException(null)'
>>> FAILED: Execution Error, return code 1 from
>>> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
>>> MapReduce Jobs Launched:
>>> Stage-Stage-5: Map: 2  Reduce: 1   Cumulative CPU: 4.08 sec   HDFS Read:
>>> 746 HDFS Write: 96 SUCCESS
>>> Stage-Stage-3: Map: 2  Reduce: 1   Cumulative CPU: 3.32 sec   HDFS Read:
>>> 889 HDFS Write: 96 SUCCESS
>>> Stage-Stage-1: Map: 2  Reduce: 1   Cumulative CPU: 3.21 sec   HDFS Read:
>>> 889 HDFS Write: 96 SUCCESS
>>>
>>>
>>>
>>>
>>> *Hive error (hive.log):*
>>>
>>> 2015-09-22 10:41:01,304 ERROR [main]: exec.Task
>>> (SessionState.java:printError(833)) - Job Submission failed with exception
>>> 'java.lang.NullPointerException(null)'
>>> java.lang.NullPointerException
>>> at
>>> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:265)
>>> at
>>> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:272)
>>> at
>>> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:509)
>>> at
>>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624)
>>> at
>>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616)
>>> at
>>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
>>> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
>>> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
>>> at