Ruben Agudo created SQOOP-3471:
----------------------------------
Summary: While doing sqoop-export mapper progress goes back
causing duplicated data
Key: SQOOP-3471
URL: https://issues.apache.org/jira/browse/SQOOP-3471
Project: Sqoop
Issue Type: Bug
Affects Versions: 1.4.6
Reporter: Ruben Agudo
Attachments: image-2020-04-21-10-36-15-108.png
We are running the sqoop-export tool in Qubole, to export some data from S3
back to an SQL Server Database.
Our issue is that sometimes, one of the mappers of the mapping part seem that
fail/restart or something. basically we see the progress going back like in the
following image:
!image-2020-04-21-10-36-15-108!
This is causing duplicates in our destination table. I'm a bit lost because in
the documentation it says that *"If an export map task fails due to these or
other reasons, it will cause the export job to fail."* and this is not the
behaviour we are seeing.
Unfortunately we can't duplicate it in a consistent manner.
The command that we are running is:
sqoop export
-Dsqoop.export.records.per.statement=50000
-Dsqoop.export.statements.per.transaction=100
-Dsqoop.throwOnError=1
--connection-manager org.apache.sqoop.manager.SQLServerManager
--driver com.microsoft.sqlserver.jdbc.SQLServerDriver
--connect connectionString
--table config.table
--export-dir config.source
--input-fields-terminated-by ,
--num-mappers 8
--columns theColumnsToCopy
--batch
--schema theSchema
I removed the things that I can't add for privacy reasons.
What could be the cause of the mapper going back in progress? And, if that
happens, is it possible to make the sqoop export fail?
Also, if this isn't the correct channel for this, please let me know.
Thanks!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)