[
https://issues.apache.org/jira/browse/SQOOP-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284403#comment-17284403
]
jw commented on SQOOP-3471:
---------------------------
Hello, it is suggested that you can use parameter "-D
mapreduce.map.failures.maxpercent=0" to solve the problem of single map task
failure, *it will cause the export job to fail.* and then clean up the data
that has been imported into DB from the high level after failure, so as to
ensure the idempotency of the application layer.
> While doing sqoop-export mapper progress goes back causing duplicated data
> --------------------------------------------------------------------------
>
> Key: SQOOP-3471
> URL: https://issues.apache.org/jira/browse/SQOOP-3471
> Project: Sqoop
> Issue Type: Bug
> Affects Versions: 1.4.6
> Reporter: Ruben Agudo
> Priority: Major
> Attachments: image-2020-04-21-10-36-15-108.png
>
>
> We are running the sqoop-export tool in Qubole, to export some data from S3
> back to an SQL Server Database.
> Our issue is that sometimes, one of the mappers of the mapping part seem that
> fail/restart or something. basically we see the progress going back like in
> the following image:
> !image-2020-04-21-10-36-15-108.png!
> This is causing duplicates in our destination table. I'm a bit lost because
> in the documentation it says that *"If an export map task fails due to these
> or other reasons, it will cause the export job to fail."* and this is not the
> behaviour we are seeing.
> Unfortunately we can't duplicate it in a consistent manner.
> The command that we are running is:
> sqoop export
> -Dsqoop.export.records.per.statement=50000
> -Dsqoop.export.statements.per.transaction=100
> -Dsqoop.throwOnError=1
> --connection-manager org.apache.sqoop.manager.SQLServerManager
> --driver com.microsoft.sqlserver.jdbc.SQLServerDriver
> --connect connectionString
> --table config.table
> --export-dir config.source
> --input-fields-terminated-by ,
> --num-mappers 8
> --columns theColumnsToCopy
> --batch
> --schema theSchema
> I removed the things that I can't add for privacy reasons.
> And the table we want to export contains 237,371,726 records.
> What could be the cause of the mapper going back in progress? And, if that
> happens, is it possible to make the sqoop export fail?
> Also, if this isn't the correct channel for this, please let me know.
> Thanks!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)