[ 
https://issues.apache.org/jira/browse/SQOOP-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreyas Joshi updated SQOOP-3242:
---------------------------------
    Description: 
*Problem description:*

We are importing data from MySQL into ORC tables. A few days ago the import 
process started failing, and has been failing since. 

It's always the first mapper that fails. The stack trace from the output of the 
failed map job copying all the columns from the mysql the table is:

{noformat}
2017-10-05 10:08:44,794 WARN [DataStreamer for file 
/some_location/_SCRATCH0.6754802978897544/snapshot_date=2017-09-25/snapshot_version=1506322835/_temporary/1/_temporary/attempt_1496703039885_122293_m_000000_0/part-m-00000]
 org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on 
/some_location/_SCRATCH0.6754802978897544/snapshot_date=2017-09-25/snapshot_version=1506322835/_temporary/1/_temporary/attempt_1496703039885_122293_m_000000_0/part-m-00000
 (inode 100779406): File does not exist. Holder 
DFSClient_attempt_1496703039885_122293_m_000000_0_1190864689_1 does not have 
any open files.
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3516)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3313)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3169)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

        at org.apache.hadoop.ipc.Client.call(Client.java:1468)
        at org.apache.hadoop.ipc.Client.call(Client.java:1399)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
        at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1532)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1349)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
2017-10-05 10:10:43,950 INFO [main] org.apache.hadoop.mapred.MapTask: Ignoring 
exception during close for 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector@362018fd
java.lang.IllegalArgumentException: Column has wrong number of index entries 
found: 0 expected: 822
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:802)
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1742)
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:2133)
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2425)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:106)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:91)
        at 
org.apache.hive.hcatalog.mapreduce.StaticPartitionFileRecordWriterContainer.close(StaticPartitionFileRecordWriterContainer.java:53)
        at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:667)
        at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:2012)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:794)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2017-10-05 10:10:43,950 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on 
/some_location/_SCRATCH0.6754802978897544/snapshot_date=2017-09-25/snapshot_version=1506322835/_temporary/1/_temporary/attempt_1496703039885_122293_m_000000_0/part-m-00000
 (inode 100779406): File does not exist. Holder 
DFSClient_attempt_1496703039885_122293_m_000000_0_1190864689_1 does not have 
any open files.
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3516)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3313)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3169)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

        at org.apache.hadoop.ipc.Client.call(Client.java:1468)
        at org.apache.hadoop.ipc.Client.call(Client.java:1399)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
        at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1532)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1349)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
{noformat}

*Source table:*
Here's what the source table in mysql looks like:

{noformat}
+----------------+------------------+------+-----+---------+----------------+
| Field          | Type             | Null | Key | Default | Extra          |
+----------------+------------------+------+-----+---------+----------------+
| id             | bigint(11)       | NO   | PRI | NULL    | auto_increment |
| user_id        | int(11) unsigned | YES  | MUL | NULL    |                |
| repository_id  | int(11) unsigned | YES  | MUL | NULL    |                |
| commit_count   | int(11) unsigned | YES  |     | 0       |                |
| committed_date | date             | YES  |     | NULL    |                |
| created_at     | datetime         | YES  |     | NULL    |                |
| updated_at     | datetime         | YES  |     | NULL    |                |
+----------------+------------------+------+-----+---------+----------------+
{noformat}

*Destination table:*

Here's what the destination table in hive looks like:

{noformat}
CREATE TABLE `test_db.test_table`(
  `id` bigint,
  `user_id` bigint,
  `repository_id` bigint,
  `commit_count` bigint,
  `committed_date` date,
  `created_at` timestamp,
  `updated_at` timestamp)
PARTITIONED BY (
  `snapshot_date` string,
  `snapshot_version` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
TBLPROPERTIES (
  'transactional'='true'
)
{noformat}

*More info:*

If the table in Hive is updated and only the {{id}} column is imported, the 
import is successful. However, when the {{commit_count}} column is added (ie 
now importing {{id}} and {{commit_count}}) the import fails. The failure is 
slightly different from above:

{noformat}
2017-10-05 10:44:11,862 INFO [main] org.apache.hadoop.mapred.MapTask: Ignoring 
exception during close for 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector@6f77bb11
java.nio.channels.ClosedChannelException
        at 
org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1622)
        at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:104)
        at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$DirectStream.output(WriterImpl.java:464)
        at org.apache.hadoop.hive.ql.io.orc.OutStream.flush(OutStream.java:242)
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.writeMetadata(WriterImpl.java:2328)
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2426)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:106)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:91)
        at 
org.apache.hive.hcatalog.mapreduce.StaticPartitionFileRecordWriterContainer.close(StaticPartitionFileRecordWriterContainer.java:53)
        at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:667)
        at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:2012)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:794)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2017-10-05 10:44:11,863 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on 
/some_path/_SCRATCH0.2200603632389685/snapshot_date=2017-09-25/snapshot_version=1506322835/_temporary/1/_temporary/attempt_1496703039885_122336_m_000000_0/part-m-00000
 (inode 100782116): File does not exist. Holder 
DFSClient_attempt_1496703039885_122336_m_000000_0_1708047031_1 does not have 
any open files.
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3516)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3604)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3574)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:700)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:526)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

        at org.apache.hadoop.ipc.Client.call(Client.java:1468)
        at org.apache.hadoop.ipc.Client.call(Client.java:1399)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
        at com.sun.proxy.$Proxy12.complete(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:443)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy13.complete(Unknown Source)
        at 
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2250)
        at 
org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2234)
        at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
        at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2429)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:106)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:91)
        at 
org.apache.hive.hcatalog.mapreduce.StaticPartitionFileRecordWriterContainer.close(StaticPartitionFileRecordWriterContainer.java:53)
        at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:667)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:790)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
{noformat}

Sqoop invocation to import {{id}} only:

{noformat}
bin/sqoop import -Dmap.retry.exponentialBackOff=TRUE -Dmap.retry.numRetries=5 
--verbose --connect 'jdbc:mysql://host/db?serverTimezone=UTC&useSSL=false' 
--username username --password password --num-mappers 13 --hcatalog-home 
/usr/lib/hive/hcatalog --hcatalog-database test_db --hcatalog-table test_table 
--hcatalog-storage-stanza 'stored as orc tblproperties 
("orc.compress"="SNAPPY")' --query 'SELECT id FROM source_table WHERE 
$CONDITIONS AND committed_date > 2017-09-01' --boundary-query 'select MIN(id), 
MAX(id) from source_table' --split-by id --hcatalog-partition-keys 
snapshot_date,snapshot_version --hcatalog-partition-values 2017-09-25,1506322835
{noformat}

Sqoop invocation to import {{id}} and {{commit_count}}:

{noformat}
bin/sqoop import -Dmap.retry.exponentialBackOff=TRUE -Dmap.retry.numRetries=5 
--verbose --connect 'jdbc:mysql://host/db?serverTimezone=UTC&useSSL=false' 
--username username --password password --num-mappers 13 --hcatalog-home 
/usr/lib/hive/hcatalog --hcatalog-database test_db --hcatalog-table test_table 
--hcatalog-storage-stanza 'stored as orc tblproperties 
("orc.compress"="SNAPPY")' --query 'SELECT id, commit_count FROM source_table 
WHERE $CONDITIONS AND committed_date > 2017-09-01' --boundary-query 'select 
MIN(id), MAX(id) from source_table' --split-by id --hcatalog-partition-keys 
snapshot_date,snapshot_version --hcatalog-partition-values 2017-09-25,1506322835
{noformat}

* We seem to have better luck with a single mapper, but this takes too long so 
is not a realistic solution.
* We tried disabling speculative execution, that didn't work either.

  was:
*Problem description:*

We are importing data from MySQL into ORC tables. A few days ago the import 
process started failing, and has been failing since. 

It's always the first mapper that fails. The stack trace from the output of the 
failed map job copying all the columns from the mysql the table is:

{noformat}
2017-10-05 10:08:44,794 WARN [DataStreamer for file 
/some_location/_SCRATCH0.6754802978897544/snapshot_date=2017-09-25/snapshot_version=1506322835/_temporary/1/_temporary/attempt_1496703039885_122293_m_000000_0/part-m-00000]
 org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on 
/some_location/_SCRATCH0.6754802978897544/snapshot_date=2017-09-25/snapshot_version=1506322835/_temporary/1/_temporary/attempt_1496703039885_122293_m_000000_0/part-m-00000
 (inode 100779406): File does not exist. Holder 
DFSClient_attempt_1496703039885_122293_m_000000_0_1190864689_1 does not have 
any open files.
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3516)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3313)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3169)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

        at org.apache.hadoop.ipc.Client.call(Client.java:1468)
        at org.apache.hadoop.ipc.Client.call(Client.java:1399)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
        at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1532)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1349)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
2017-10-05 10:10:43,950 INFO [main] org.apache.hadoop.mapred.MapTask: Ignoring 
exception during close for 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector@362018fd
java.lang.IllegalArgumentException: Column has wrong number of index entries 
found: 0 expected: 822
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:802)
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1742)
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:2133)
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2425)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:106)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:91)
        at 
org.apache.hive.hcatalog.mapreduce.StaticPartitionFileRecordWriterContainer.close(StaticPartitionFileRecordWriterContainer.java:53)
        at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:667)
        at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:2012)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:794)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2017-10-05 10:10:43,950 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on 
/some_location/_SCRATCH0.6754802978897544/snapshot_date=2017-09-25/snapshot_version=1506322835/_temporary/1/_temporary/attempt_1496703039885_122293_m_000000_0/part-m-00000
 (inode 100779406): File does not exist. Holder 
DFSClient_attempt_1496703039885_122293_m_000000_0_1190864689_1 does not have 
any open files.
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3516)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3313)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3169)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

        at org.apache.hadoop.ipc.Client.call(Client.java:1468)
        at org.apache.hadoop.ipc.Client.call(Client.java:1399)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
        at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1532)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1349)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
{noformat}

*Source table:*
Here's what the source table in mysql looks like:

{noformat}
+----------------+------------------+------+-----+---------+----------------+
| Field          | Type             | Null | Key | Default | Extra          |
+----------------+------------------+------+-----+---------+----------------+
| id             | bigint(11)       | NO   | PRI | NULL    | auto_increment |
| user_id        | int(11) unsigned | YES  | MUL | NULL    |                |
| repository_id  | int(11) unsigned | YES  | MUL | NULL    |                |
| commit_count   | int(11) unsigned | YES  |     | 0       |                |
| committed_date | date             | YES  |     | NULL    |                |
| created_at     | datetime         | YES  |     | NULL    |                |
| updated_at     | datetime         | YES  |     | NULL    |                |
+----------------+------------------+------+-----+---------+----------------+
{noformat}

*Destination table:*

Here's what the destination table in hive looks like:

{noformat}
CREATE TABLE `test_db.test_table`(
  `id` bigint,
  `user_id` bigint,
  `repository_id` bigint,
  `commit_count` bigint,
  `committed_date` date,
  `created_at` timestamp,
  `updated_at` timestamp)
PARTITIONED BY (
  `snapshot_date` string,
  `snapshot_version` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
TBLPROPERTIES (
  'transactional'='true'
)
{noformat}

*More info:*

If the table in Hive is updated and only the {{id}} column is imported, the 
import is successful. However, when the {{commit_count}} column is added (ie 
now importing {{id}} and {{commit_count}}) the import fails. The failure is 
slightly different from above:

{noformat}
2017-10-05 10:44:11,862 INFO [main] org.apache.hadoop.mapred.MapTask: Ignoring 
exception during close for 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector@6f77bb11
java.nio.channels.ClosedChannelException
        at 
org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1622)
        at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:104)
        at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$DirectStream.output(WriterImpl.java:464)
        at org.apache.hadoop.hive.ql.io.orc.OutStream.flush(OutStream.java:242)
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.writeMetadata(WriterImpl.java:2328)
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2426)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:106)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:91)
        at 
org.apache.hive.hcatalog.mapreduce.StaticPartitionFileRecordWriterContainer.close(StaticPartitionFileRecordWriterContainer.java:53)
        at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:667)
        at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:2012)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:794)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2017-10-05 10:44:11,863 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on 
/some_path/_SCRATCH0.2200603632389685/snapshot_date=2017-09-25/snapshot_version=1506322835/_temporary/1/_temporary/attempt_1496703039885_122336_m_000000_0/part-m-00000
 (inode 100782116): File does not exist. Holder 
DFSClient_attempt_1496703039885_122336_m_000000_0_1708047031_1 does not have 
any open files.
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3516)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3604)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3574)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:700)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:526)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

        at org.apache.hadoop.ipc.Client.call(Client.java:1468)
        at org.apache.hadoop.ipc.Client.call(Client.java:1399)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
        at com.sun.proxy.$Proxy12.complete(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:443)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy13.complete(Unknown Source)
        at 
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2250)
        at 
org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2234)
        at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
        at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2429)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:106)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:91)
        at 
org.apache.hive.hcatalog.mapreduce.StaticPartitionFileRecordWriterContainer.close(StaticPartitionFileRecordWriterContainer.java:53)
        at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:667)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:790)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
{noformat}

* We seem to have better luck with a single mapper, but this takes too long so 
is not a realistic solution.
* We tried disabling speculative execution, that didn't work either.


> Import from MySQL to Hive table failing
> ---------------------------------------
>
>                 Key: SQOOP-3242
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3242
>             Project: Sqoop
>          Issue Type: Bug
>         Environment: Sqoop version 1.4.6
> MySQL version 5.7.19
> Hive version 1.2.0
> Debian Jessie
>            Reporter: Shreyas Joshi
>
> *Problem description:*
> We are importing data from MySQL into ORC tables. A few days ago the import 
> process started failing, and has been failing since. 
> It's always the first mapper that fails. The stack trace from the output of 
> the failed map job copying all the columns from the mysql the table is:
> {noformat}
> 2017-10-05 10:08:44,794 WARN [DataStreamer for file 
> /some_location/_SCRATCH0.6754802978897544/snapshot_date=2017-09-25/snapshot_version=1506322835/_temporary/1/_temporary/attempt_1496703039885_122293_m_000000_0/part-m-00000]
>  org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on 
> /some_location/_SCRATCH0.6754802978897544/snapshot_date=2017-09-25/snapshot_version=1506322835/_temporary/1/_temporary/attempt_1496703039885_122293_m_000000_0/part-m-00000
>  (inode 100779406): File does not exist. Holder 
> DFSClient_attempt_1496703039885_122293_m_000000_0_1190864689_1 does not have 
> any open files.
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3516)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3313)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3169)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
>       at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1468)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>       at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>       at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1532)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1349)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
> 2017-10-05 10:10:43,950 INFO [main] org.apache.hadoop.mapred.MapTask: 
> Ignoring exception during close for 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector@362018fd
> java.lang.IllegalArgumentException: Column has wrong number of index entries 
> found: 0 expected: 822
>       at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:802)
>       at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1742)
>       at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:2133)
>       at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2425)
>       at 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:106)
>       at 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:91)
>       at 
> org.apache.hive.hcatalog.mapreduce.StaticPartitionFileRecordWriterContainer.close(StaticPartitionFileRecordWriterContainer.java:53)
>       at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:667)
>       at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:2012)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:794)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> 2017-10-05 10:10:43,950 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on 
> /some_location/_SCRATCH0.6754802978897544/snapshot_date=2017-09-25/snapshot_version=1506322835/_temporary/1/_temporary/attempt_1496703039885_122293_m_000000_0/part-m-00000
>  (inode 100779406): File does not exist. Holder 
> DFSClient_attempt_1496703039885_122293_m_000000_0_1190864689_1 does not have 
> any open files.
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3516)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3313)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3169)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
>       at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1468)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>       at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>       at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1532)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1349)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
> {noformat}
> *Source table:*
> Here's what the source table in mysql looks like:
> {noformat}
> +----------------+------------------+------+-----+---------+----------------+
> | Field          | Type             | Null | Key | Default | Extra          |
> +----------------+------------------+------+-----+---------+----------------+
> | id             | bigint(11)       | NO   | PRI | NULL    | auto_increment |
> | user_id        | int(11) unsigned | YES  | MUL | NULL    |                |
> | repository_id  | int(11) unsigned | YES  | MUL | NULL    |                |
> | commit_count   | int(11) unsigned | YES  |     | 0       |                |
> | committed_date | date             | YES  |     | NULL    |                |
> | created_at     | datetime         | YES  |     | NULL    |                |
> | updated_at     | datetime         | YES  |     | NULL    |                |
> +----------------+------------------+------+-----+---------+----------------+
> {noformat}
> *Destination table:*
> Here's what the destination table in hive looks like:
> {noformat}
> CREATE TABLE `test_db.test_table`(
>   `id` bigint,
>   `user_id` bigint,
>   `repository_id` bigint,
>   `commit_count` bigint,
>   `committed_date` date,
>   `created_at` timestamp,
>   `updated_at` timestamp)
> PARTITIONED BY (
>   `snapshot_date` string,
>   `snapshot_version` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> TBLPROPERTIES (
>   'transactional'='true'
> )
> {noformat}
> *More info:*
> If the table in Hive is updated and only the {{id}} column is imported, the 
> import is successful. However, when the {{commit_count}} column is added (ie 
> now importing {{id}} and {{commit_count}}) the import fails. The failure is 
> slightly different from above:
> {noformat}
> 2017-10-05 10:44:11,862 INFO [main] org.apache.hadoop.mapred.MapTask: 
> Ignoring exception during close for 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector@6f77bb11
> java.nio.channels.ClosedChannelException
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1622)
>       at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:104)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
>       at java.io.DataOutputStream.write(DataOutputStream.java:107)
>       at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$DirectStream.output(WriterImpl.java:464)
>       at org.apache.hadoop.hive.ql.io.orc.OutStream.flush(OutStream.java:242)
>       at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.writeMetadata(WriterImpl.java:2328)
>       at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2426)
>       at 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:106)
>       at 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:91)
>       at 
> org.apache.hive.hcatalog.mapreduce.StaticPartitionFileRecordWriterContainer.close(StaticPartitionFileRecordWriterContainer.java:53)
>       at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:667)
>       at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:2012)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:794)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> 2017-10-05 10:44:11,863 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on 
> /some_path/_SCRATCH0.2200603632389685/snapshot_date=2017-09-25/snapshot_version=1506322835/_temporary/1/_temporary/attempt_1496703039885_122336_m_000000_0/part-m-00000
>  (inode 100782116): File does not exist. Holder 
> DFSClient_attempt_1496703039885_122336_m_000000_0_1708047031_1 does not have 
> any open files.
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3516)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3604)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3574)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:700)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:526)
>       at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1468)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>       at com.sun.proxy.$Proxy12.complete(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:443)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>       at com.sun.proxy.$Proxy13.complete(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2250)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2234)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
>       at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2429)
>       at 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:106)
>       at 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:91)
>       at 
> org.apache.hive.hcatalog.mapreduce.StaticPartitionFileRecordWriterContainer.close(StaticPartitionFileRecordWriterContainer.java:53)
>       at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:667)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:790)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> {noformat}
> Sqoop invocation to import {{id}} only:
> {noformat}
> bin/sqoop import -Dmap.retry.exponentialBackOff=TRUE -Dmap.retry.numRetries=5 
> --verbose --connect 'jdbc:mysql://host/db?serverTimezone=UTC&useSSL=false' 
> --username username --password password --num-mappers 13 --hcatalog-home 
> /usr/lib/hive/hcatalog --hcatalog-database test_db --hcatalog-table 
> test_table --hcatalog-storage-stanza 'stored as orc tblproperties 
> ("orc.compress"="SNAPPY")' --query 'SELECT id FROM source_table WHERE 
> $CONDITIONS AND committed_date > 2017-09-01' --boundary-query 'select 
> MIN(id), MAX(id) from source_table' --split-by id --hcatalog-partition-keys 
> snapshot_date,snapshot_version --hcatalog-partition-values 
> 2017-09-25,1506322835
> {noformat}
> Sqoop invocation to import {{id}} and {{commit_count}}:
> {noformat}
> bin/sqoop import -Dmap.retry.exponentialBackOff=TRUE -Dmap.retry.numRetries=5 
> --verbose --connect 'jdbc:mysql://host/db?serverTimezone=UTC&useSSL=false' 
> --username username --password password --num-mappers 13 --hcatalog-home 
> /usr/lib/hive/hcatalog --hcatalog-database test_db --hcatalog-table 
> test_table --hcatalog-storage-stanza 'stored as orc tblproperties 
> ("orc.compress"="SNAPPY")' --query 'SELECT id, commit_count FROM source_table 
> WHERE $CONDITIONS AND committed_date > 2017-09-01' --boundary-query 'select 
> MIN(id), MAX(id) from source_table' --split-by id --hcatalog-partition-keys 
> snapshot_date,snapshot_version --hcatalog-partition-values 
> 2017-09-25,1506322835
> {noformat}
> * We seem to have better luck with a single mapper, but this takes too long 
> so is not a realistic solution.
> * We tried disabling speculative execution, that didn't work either.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to