Re: Hive query started map task being killed during execution

2013-03-08 Thread Dean Wampler
Do you have more than one hive process running? It looks like you're using
Derby, which only supports one process at a time. Also, you have to start
Hive from the same directory every time, where the metastore database is
written, unless you edit the JDBC connection property in the Hive config
file to point to a particular path. Here's what I use:

property
  namejavax.jdo.option.ConnectionURL/name

valuejdbc:derby:;databaseName=/Users/somedufus/hive/metastore_db;create=true/value
  descriptionJDBC connect string for a JDBC metastore/description
/property


On Fri, Mar 8, 2013 at 4:09 PM, Dileep Kumar dileepkumar...@gmail.comwrote:

 Hi All,

 I am running a hive query which does insert into a table.
 What I noticed from the symptom it looks like it got to do with some
 settings but  I am not able to figure out what settings.

 When I submit the query it starts 2130 map tasks in the job and 150 of
 them completes fine without any error and then next batch of 75 gets killed
 and all of them after that gets killed.
 While I submit a similar query based on smaller table its starts around
 only 135 map tasks and it runs till completion without any error and does
 the insert into appropriate table.

 I don't find any obvious error messages in any of the tasks log apart form
 this:


 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001636_0/syslog:2013-03-08
 08:54:06,910 INFO orapache.hadoop.hive.ql.exec.MapOperator:
 DESERIALIZE_ERRORS:0
 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
 08:41:06,060 INFO orapache.hadoop.hive.ql.exec.MapOperator:
 DESERIALIZE_ERRORS:0
 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
 08:46:54,390 ERROR o.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher:
 Error during instantiating JDBC driver org.apache.derby.jdbc.EmbeddedDriver.
 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
 08:46:54,394 ERROR o.apache.hadoop.hive.ql.exec.FileSinkOperator:
 StatsPublishing error: cannot connect to database

 Please suggest if I need to set anything in Hive when I invoke this query.
 The query that runs successfully has lot less rows compared to on that
 fails.

 Thanks,
 DK




-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330


Re: Hive query started map task being killed during execution

2013-03-08 Thread Dileep Kumar
Thanks for your attention !
No only one hive process is running and thing that bother me is smaller
query runs till completion which I invoke the same way. It is using embeded
db if that is the problem I can change it to external DB but as my smaller
query runs fine I thought this should be OK.


On Fri, Mar 8, 2013 at 2:16 PM, Dean Wampler 
dean.wamp...@thinkbiganalytics.com wrote:

 Do you have more than one hive process running? It looks like you're using
 Derby, which only supports one process at a time. Also, you have to start
 Hive from the same directory every time, where the metastore database is
 written, unless you edit the JDBC connection property in the Hive config
 file to point to a particular path. Here's what I use:

 property
   namejavax.jdo.option.ConnectionURL/name

 valuejdbc:derby:;databaseName=/Users/somedufus/hive/metastore_db;create=true/value
   descriptionJDBC connect string for a JDBC metastore/description
 /property


 On Fri, Mar 8, 2013 at 4:09 PM, Dileep Kumar dileepkumar...@gmail.comwrote:

 Hi All,

 I am running a hive query which does insert into a table.
 What I noticed from the symptom it looks like it got to do with some
 settings but  I am not able to figure out what settings.

 When I submit the query it starts 2130 map tasks in the job and 150 of
 them completes fine without any error and then next batch of 75 gets killed
 and all of them after that gets killed.
 While I submit a similar query based on smaller table its starts around
 only 135 map tasks and it runs till completion without any error and does
 the insert into appropriate table.

 I don't find any obvious error messages in any of the tasks log apart
 form this:


 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001636_0/syslog:2013-03-08
 08:54:06,910 INFO orapache.hadoop.hive.ql.exec.MapOperator:
 DESERIALIZE_ERRORS:0
 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
 08:41:06,060 INFO orapache.hadoop.hive.ql.exec.MapOperator:
 DESERIALIZE_ERRORS:0
 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
 08:46:54,390 ERROR o.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher:
 Error during instantiating JDBC driver org.apache.derby.jdbc.EmbeddedDriver.
 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
 08:46:54,394 ERROR o.apache.hadoop.hive.ql.exec.FileSinkOperator:
 StatsPublishing error: cannot connect to database

 Please suggest if I need to set anything in Hive when I invoke this
 query. The query that runs successfully has lot less rows compared to on
 that fails.

 Thanks,
 DK




 --
 *Dean Wampler, Ph.D.*
 thinkbiganalytics.com
 +1-312-339-1330




Re: Hive query started map task being killed during execution

2013-03-08 Thread Abdelrhman Shettia
Hi Dileep, 

Have tried to se the following values in hive and run the query again.  More 
info why the query may fail in the following link : 

https://cwiki.apache.org/Hive/statsdev.html


set hive.stats.autogather=false;
As well as ; 

set hive.stats.dbclass=jdbc:derby;
set 
hive.stats.dbconnectionstring=jdbc:derby:;databaseName=TempStatsStore;create=true;
set hive.stats.jdbcdriver=org.apache.derby.jdbc.EmbeddedDriver;

Hope this helps. 


 

 Abdelrahman Shettia
ashet...@hortonworks.com


On Mar 8, 2013, at 2:31 PM, Dileep Kumar dileepkumar...@gmail.com wrote:

 Thanks for your attention !
 No only one hive process is running and thing that bother me is smaller query 
 runs till completion which I invoke the same way. It is using embeded db if 
 that is the problem I can change it to external DB but as my smaller query 
 runs fine I thought this should be OK.
 
 
 On Fri, Mar 8, 2013 at 2:16 PM, Dean Wampler 
 dean.wamp...@thinkbiganalytics.com wrote:
 Do you have more than one hive process running? It looks like you're using 
 Derby, which only supports one process at a time. Also, you have to start 
 Hive from the same directory every time, where the metastore database is 
 written, unless you edit the JDBC connection property in the Hive config file 
 to point to a particular path. Here's what I use:
 
 property
   namejavax.jdo.option.ConnectionURL/name
   
 valuejdbc:derby:;databaseName=/Users/somedufus/hive/metastore_db;create=true/value
   descriptionJDBC connect string for a JDBC metastore/description
 /property
 
 
 On Fri, Mar 8, 2013 at 4:09 PM, Dileep Kumar dileepkumar...@gmail.com wrote:
 Hi All,
 
 I am running a hive query which does insert into a table.
 What I noticed from the symptom it looks like it got to do with some settings 
 but  I am not able to figure out what settings.
 
 When I submit the query it starts 2130 map tasks in the job and 150 of them 
 completes fine without any error and then next batch of 75 gets killed and 
 all of them after that gets killed.
 While I submit a similar query based on smaller table its starts around only 
 135 map tasks and it runs till completion without any error and does the 
 insert into appropriate table.
 
 I don't find any obvious error messages in any of the tasks log apart form 
 this:
 
 
 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001636_0/syslog:2013-03-08
  08:54:06,910 INFO orapache.hadoop.hive.ql.exec.MapOperator: 
 DESERIALIZE_ERRORS:0
 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
  08:41:06,060 INFO orapache.hadoop.hive.ql.exec.MapOperator: 
 DESERIALIZE_ERRORS:0
 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
  08:46:54,390 ERROR o.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher: 
 Error during instantiating JDBC driver org.apache.derby.jdbc.EmbeddedDriver.
 ./hadoop-0.20-mapreduce/userlogs/job_201303080834_0001/attempt_201303080834_0001_m_001646_0/syslog:2013-03-08
  08:46:54,394 ERROR o.apache.hadoop.hive.ql.exec.FileSinkOperator: 
 StatsPublishing error: cannot connect to database
 
 Please suggest if I need to set anything in Hive when I invoke this query. 
 The query that runs successfully has lot less rows compared to on that fails.
 
 Thanks,
 DK
 
 
 
 -- 
 Dean Wampler, Ph.D.
 thinkbiganalytics.com
 +1-312-339-1330