Dovy Paukstys created SQOOP-3445:
------------------------------------
Summary: Spark with Sqoop and Kite - Parquet Mismatch in Command?
Key: SQOOP-3445
URL: https://issues.apache.org/jira/browse/SQOOP-3445
Project: Sqoop
Issue Type: Bug
Components: sqoop2-kite-connector
Affects Versions: 1.4.7
Environment: System:
* Debian 9
* Hadoop 2.9
* Spark 2.3
Installed Dependencies (JARs):
* sqoop-1.4.7-hadoop260
* kite-data-mapreduce-1.1.0
* kite-hadoop-compatibility-1.1.0.jar
* kite-data-crunch-1.1.0
* kite-data-core-1.1.0
* avro-tools-1.8.2.jar
* mysql-connector-java-5.1.42
* parquet-tools-1.8.3
Reporter: Dovy Paukstys
Not sure if the error is deep in scoop or if the error is in Kite, so I
cross-posted here: [https://github.com/kite-sdk/kite/issues/490].
I am reading from a MySQL Database and trying to write out to parquet. When
writing to Avro there are no issues, but when Kite is involved (parquet) all
hell breaks loose. First I had to manually add a ton of jar's to even get the
sucker to run. But that all seems resolved.
Also, please note, I have tried various versions of the installed dependencies,
downgrading and upgrading scoop accordingly.
When Sqoop is used without Kite (IE, Avro, not parquet) there are no issues.
The moment the job runs to export to parquet, everything blows up. It seems
like Kite may be the offender, but it may be in the scoop code for how Kite is
run.
System:
* Debian 9
* Hadoop 2.9
* Spark 2.3
Installed Dependencies (JARs):
* sqoop-1.4.7-hadoop260
* kite-data-mapreduce-1.1.0
* kite-hadoop-compatibility-1.1.0.jar
* kite-data-crunch-1.1.0
* kite-data-core-1.1.0
* avro-tools-1.8.2.jar
* mysql-connector-java-5.1.42
* parquet-tools-1.8.3
Error:
{code:java}
19/07/09 17:55:28 INFO mapreduce.Job: Job job_1562682312457_0020 failed with
state FAILED due to: Job setup failed : java.lang.IllegalArgumentException:
Parquet only supports generic and specific data models, type parameter must
implement IndexedRecord at
org.kitesdk.shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
at
org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:96)
at
org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:128)
at
org.kitesdk.data.spi.filesystem.FileSystemDataset$Builder.build(FileSystemDataset.java:687)
at
org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:199)
at org.kitesdk.data.Datasets.load(Datasets.java:108) at
org.kitesdk.data.Datasets.load(Datasets.java:165) at
org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.load(DatasetKeyOutputFormat.java:542)
at
org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateJobDataset(DatasetKeyOutputFormat.java:569)
at
org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.access$300(DatasetKeyOutputFormat.java:67)
at
org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$MergeOutputCommitter.setupJob(DatasetKeyOutputFormat.java:369)
at
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255)
at
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) 19/07/09 17:55:28 INFO mapreduce.Job:
Counters: 2{code}
Again, it only fails on the final conversion. I am not sure of the full details
since the command is inside a parallel process. Any direction would be
appreciated.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)