[
https://issues.apache.org/jira/browse/HIVE-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793254#comment-13793254
]
Justin commented on HIVE-4693:
------------------------------
We have several skewed dimensions and not being able to utilize this causes
severe performance degradation. I've tested by removing the skewed data.
Is there a workaround? I've tried setting hive.skewjoin.key to no avail.
> If you set hive.optimize.skewjoin=true, and number of identical keys is <
> hive.skewjoin.key don't fail with FileNotFoundException
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-4693
> URL: https://issues.apache.org/jira/browse/HIVE-4693
> Project: Hive
> Issue Type: Improvement
> Components: Query Processor
> Affects Versions: 0.10.0
> Reporter: Robert Justice
>
> We would like to set hive.optimize.skewjoin to true to use it when skew is
> encountered, but if the number of identical keys is not met, it will crash
> due to not finding hive_skew_join_bigkeys_0. Could we just bail out and go
> back to a standard join rather than failing?
> Ended Job = job_201306061640_0003
> java.io.FileNotFoundException: File
> hdfs://nameservice1/tmp/hive-rjustice/hive_2013-06-07_10-02-03_755_605133549375679913/-mr-10003/hive_skew_join_bigkeys_0
> does not exist.
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:410)
> at
> org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(ConditionalResolverSkewJoin.java:96)
> at
> org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1374)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1160)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:973)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:893)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> Ended Job = 672729995, job is filtered out (removed at runtime).
> MapReduce Jobs Launched:
--
This message was sent by Atlassian JIRA
(v6.1#6144)